Newbie down with 4.3.3 Kernel Panic

August 11, 200817 yr

Hi,

This is my first message.

I just attempted to build my first unRAID server after having read this forum in detail.

My goal is to have a 15 drives system but it is not working properly already with just 3 1TB drives:

Every time I build the parity, I consistently get a Kernel Panic somewhere in the middle of the process.

I did run Memtest several times for several hours and it always passes with no errors at all.

Here is my config:

unRAID 4.3.3 Pro

Cooler Master Midi Tower Centurion 590

Gigabyte GA-EP35-DS3R

Intel BX80557430 CELERON 430 1.8GHZ

Arctic Cooling ARC-ALPINE-7-PRO Cooler

Kingston KVR800D2N6K2/2G DDR2 2GB PC800 NON-ECC (Kit of 2/ 2x1GB) DIMM (SDRAM-DDR2, 1.8V, CL6)

Asus 90-C1CJS5-H0UAY00Z EAH2400PRO MAGIC/HTP/256M PCIe x16/ ATI/ 2400/ 256 MB

Enermax Modu82+, EMD525AWT, 525W, Cabelmanagement, 84%-88% efficiency

2 x Promise F29S3T4100 SATA300 TX4 PCI / 4-Channel SATA 3.0 Gb/s

3 x ICY Dock MB455 5 x SATA I/II Multi Drive Hot-Swap Module

1 x Samsung HD103UJ F1 1TB Parity drive

2 x Western Digital WD10EACS 1TB

Sony Micro Vault Tiny 4GB USB key

And here is a photo of the error message I get (sorry for the poor quality):

Code: Bad EIP value

EIP: ...

Kernel panic - not syncing: Fatal exception in interrupt

What do you think might cause this Kernel Panic?

What is EIP btw?

Many thanks in advance for helping a frustrated newbie!

MA

August 11, 200817 yr

It may be worthwhile to boot up, capture a syslog and post it.

http://lime-technology.com/wiki/index.php?title=Troubleshooting

August 11, 200817 yr

10 minutes of googling indicates this is probably hardware related. You may want to try and reseat cards, reconnect cables, etc.

One person had a similar problem and it was solved by using an add-in NIC, though I agree that a syslog capture and review should be done first.

Bill

August 11, 200817 yr

Author

It may be worthwhile to boot up, capture a syslog and post it.

http://lime-technology.com/wiki/index.php?title=Troubleshooting

I must add I tried 4.3.3, 4.3.2 and 4.2.4 and Kernel Panic happened during parity build every time except once, where the parity build finished but Kernel Panic occurred later half way during the transfer of the content through the network of a 1TB drive to the unRAID server.

Here is the syslog after a fresh boot of 4.2.4:

http://euuff.com/unraid/syslog-2008-08-11.txt

August 11, 200817 yr

It may be worthwhile to boot up, capture a syslog and post it.

http://lime-technology.com/wiki/index.php?title=Troubleshooting

I must add I tried 4.3.3, 4.3.2 and 4.2.4 and Kernel Panic happened during parity build every time except once, where the parity build finished but Kernel Panic occurred later half way during the transfer of the content through the network of a 1TB drive to the unRAID server.

Here is the syslog after a fresh boot of 4.2.4:

http://euuff.com/unraid/syslog-2008-08-11.txt

Even though you ran memtest, it was not with the disks being accessed at the same time. Try swapping the positions of your two strips of RAM. Can't hurt, might help. Make sure you have the voltage for the RAM set correctly in the BIOS. Same for any RAM timing settings.

Joe L.

August 11, 200817 yr

On the Hardware Compatibility wiki page, there is a note for your board, advising 'Upgrade to latest BIOS'. Strongly recommended.

According to your syslog, you do not appear to have configured the BIOS settings for AHCI, which is recommended. That will change which driver is controlling the drives attached to the board. You also do not appear to have SMART enabled in the BIOS, which is also highly recommended.

Until this is resolved, I would recommend pulling (temporarily) both of the Promise TX4's.

All 3 drives are reporting 'exception Emask' errors, of the '(device error)' type. The errors appear to be resolved, and I'm unsure of the significance, but it is NOT normal.

As to the EIP, don't worry about that specifically. Any kernel panic is going to dump the registers, that being one of them, and there is usually nothing usable in the register values for normal users. There may sometimes be a clue in the process names, and in the 'mangled' names in the call trace.

August 11, 200817 yr

If the suggestions above are not enough, you should try some of the boot options on the Boot Codes wiki page. But hopefully, all you need is a BIOS upgrade.

August 12, 200817 yr

Author

On the Hardware Compatibility wiki page, there is a note for your board, advising 'Upgrade to latest BIOS'. Strongly recommended.

According to your syslog, you do not appear to have configured the BIOS settings for AHCI, which is recommended. That will change which driver is controlling the drives attached to the board. You also do not appear to have SMART enabled in the BIOS, which is also highly recommended.

Until this is resolved, I would recommend pulling (temporarily) both of the Promise TX4's.

All 3 drives are reporting 'exception Emask' errors, of the '(device error)' type. The errors appear to be resolved, and I'm unsure of the significance, but it is NOT normal.

As to the EIP, don't worry about that specifically. Any kernel panic is going to dump the registers, that being one of them, and there is usually nothing usable in the register values for normal users. There may sometimes be a clue in the process names, and in the 'mangled' names in the call trace.

Following your advice, I have upgraded the bios from version F2 to the latest one, version F3, which is btw apparently only available since last July:

http://www.gigabyte.com.tw/Support/Motherboard/BIOS_Model.aspx?ProductID=2743

I also set MB SATA controllers to AHCI in the BIOS.

The parity process has now being completed successfully (under 4.2.4).

Also, no 'exception Emask' errors anymore as you can see in this new syslog copy:

http://euuff.com/unraid/syslog-2008-08-12.txt

I did not remove both Promise TX4's yet, to see if it would work.

Thank you all for your support, I would not make any progress on this alone!

As next steps, I will build parity under 4.3.3 and transfer TBs to see if my unRAID server works properly or not

Btw, what do you think of my power supply:

http://www.enermax.com.tw/english/product_Display1.asp?PrID=110

I was attracted by its high efficiency but is it going to be able to handle up to 15 drives?

I was thinking to add drives one by one.

Thanks again!

MA

August 13, 200817 yr

I would be a little worried about the power supply since it has 3 12 volt rails. You are better off with a power supply with a single 12 volt rail. All of your amperage is on that one rail, so you get all available power.

Phil

August 13, 200817 yr

That PSU sounds good. Most triple rail PSU's in that size have 18-20 amps on the 12v rails. This one has 25. Also many PSU's with separate rails are not truly separate. That is, if you overload one because of the way it is cabled, they can draw on the other rails.

August 17, 200817 yr

Author

Hi again,

So far I encountered one more Kernel Panic but overall the system now looks stable.

Now in order to test its redundancy reliability I'm trying to remove one data drive, erase it and then put it back to see if it is rebuilt correctly:

- I transfered 2TB of data into both data drives. They were practically full.

- Parity was alway ok with 0 error

- I then shutdown the server and removed the first data disk

- I formatted this disk on another workstation with a mac osx file system

- I then put back the formatted disk and boot unRAID.

- The Web console display the unfomatted disk:

- Then, I have clicked format to format the unformatted drive:

- But now I am stuck here: The drive is formatted and empty but no rebuilding has started or can be started:

Am I doing something wrong?

Is the fact I used the same drive for my test (therefore with the same serial nr) the reason unRAID does not start rebuilding it?

Thanks, MA

August 17, 200817 yr

Yes, the unRAID array doe not have any way of knowing what you did while it was not looking to the disk. (you "erased" it, or rather, it no longer has a reiserfs to mount.

At this point, your array has forgotten whatever it had on the disk you re-formatted, and you asked it to re-format it, so it has also re-written parity.

To test, shut down the array, un-assign the disk you will want to test. Then power back up. It will detect the "missing" disk and show the fact that it is supplying via parity the old contents. When you un-assigned it, and then re-started the array with out it it erased the array's knowledge of the disks model/serial by updating the super.dat file on the flash drive.

Now, stop the array, re-assign the drive, and re-start the array, the rebuild will start (you will probably need to press start and check an "I'm sure" checkbox.)

The key is that the super.dat file on the flash drive sees either a missing disk, or a different disk in a given slot.

Joe L.

August 17, 200817 yr

Author

Hi Joe,

I did not un-assigned the drive in the Devices page, I simply stopped the array and shut down the server and then removed the drive physically to format it to erase all its content.

In fact unRAID is now apparently rebuilding the disk: I stopped the array un-assigned the disk, restarted the array, stoped the array, re-assigned the disk, and then the rebuild option appeared!:

Thanks for your input,

M

August 17, 200817 yr

yes, it is rebuilding it, but you already re-formatted it... so odds are it will rebuild it as freshly re-formatted, with no files on it.

In the future, never press the format button on a disk you know has data, never press the "Restore" button unless you are permanently removing a disk from the array and do not intend to replace it with another in the same slot in a short interval.

When you replace a failed disk, there is no need to format it, unRAID will rebuild the old contents (including the formatting on it) when you re-start the array.

Joe L.

August 19, 200817 yr

Author

Thanks Joe,

You are right, the data disk was rebuilt empty. I think I understand now some mistakes I made in my test: I should have unassigned the disk before removing it from the array and I should have not formatted it with unRAID once reinserted.

I did this test a second time and unRAID rebuilt the data disk with all its data perfectly. It was great btw to notice that, before rebuilding, unRAID was online delivering the missing data with no problem, in this case a collection of HD movies, like if the missing disk was present physically!

MA

August 30, 200817 yr

Author

Hi,

I now have 6 1TB disks working without any problems and no parity errors.

However I'm currently not able to add a new one. Clearing a newly added disk can be started but it apparently never finish. It takes hours or even days.

I have tried 2 different 1TB disks, an Hitachi and a WD and with the Hitachi the clearing percentage was even erratic: for instance it was displaying 64% and then 35% immediately after I refreshed the page and then back to 64% or another value, and so on.

Looking at the Hitachi clearing syslog (http://www.euuff.com/unraid/syslog-2008-08-28.txt) and it seems it's starting the clearing process over and over:

...

Aug 27 21:31:10 unraid1 emhttp: ... clearing 97% complete

Aug 27 21:57:35 unraid1 emhttp: ... clearing 98% complete

Aug 27 22:24:41 unraid1 emhttp: ... clearing 99% complete

Aug 27 22:51:21 unraid1 emhttp: ... clearing 100% complete

Aug 27 22:51:21 unraid1 emhttp: ... syncing

Aug 27 22:51:43 unraid1 kernel: mdcmd (27): start

Aug 27 22:51:43 unraid1 kernel: md6: new disk

Aug 27 22:51:43 unraid1 kernel: md: start_array: state 1 does't match PROTECTED_EXPANSION

Aug 27 22:51:43 unraid1 kernel: md6: new disk

Aug 27 22:51:43 unraid1 emhttp: ... clearing 0% complete

Aug 27 22:51:43 unraid1 emhttp: ... clearing 1% complete

Aug 27 22:51:43 unraid1 emhttp: ... clearing 2% complete

...

or is interrupted:

...

Aug 28 00:13:42 unraid1 emhttp: ... clearing 93% complete

Aug 28 00:13:42 unraid1 emhttp: ... clearing 94% complete

Aug 28 00:13:42 unraid1 emhttp: ... clearing 95% complete

Aug 28 00:13:42 unraid1 kernel: md6: new disk

Aug 28 00:13:43 unraid1 kernel: md6: new disk

Aug 28 00:13:43 unraid1 emhttp: shcmd (70): killall -w smbd nmbd

Aug 28 00:13:44 unraid1 emhttp: shcmd (71): /usr/sbin/nmbd -D

Aug 28 00:13:44 unraid1 emhttp: shcmd (72): /usr/sbin/smbd -D

Aug 28 00:13:44 unraid1 emhttp: clearing disk6 ...

Aug 28 00:13:45 unraid1 emhttp: ... clearing 1% complete

Aug 28 00:13:45 unraid1 emhttp: ... clearing 2% complete

Aug 28 00:13:45 unraid1 emhttp: ... clearing 3% complete

...

Looking at the WD clearing syslog (http://www.euuff.com/unraid/syslog-2008-08-30.txt) and it seems it's often interrupted and also the message "kernel: md7: new disk" pops in the middle

...

Aug 30 07:38:19 unraid1 emhttp: ... clearing 53% complete

Aug 30 07:38:19 unraid1 emhttp: ... clearing 54% complete

Aug 30 07:38:19 unraid1 emhttp: ... clearing 55% complete

Aug 30 07:38:19 unraid1 kernel: md7: new disk

Aug 30 07:45:38 unraid1 emhttp: ... clearing 56% complete

Aug 30 08:01:48 unraid1 emhttp: ... clearing 57% complete

Aug 30 08:17:20 unraid1 emhttp: ... clearing 58% complete

Aug 30 08:33:04 unraid1 emhttp: ... clearing 59% complete

...

What is the problem?

Maybe it's related to the fact the drive is connected to the second 2-ports SATA controller of my motherboard (the 6 other functioning drives are connected to the other 6-ports controller of the motherboard)?

MA

August 30, 200817 yr

I don't know what is the trouble, but i can help for the "kernel: md7: new disk" messages.

I found this message wierd too and finaly discovered it pops each time you ask for an unraid page in the Internet Explorer. If you refresh the main page for example, it will pop 1 line for each new drive that are currently clearing. So if you refresh 10 times the page to check how it goes, you will get 10 times those lines.

August 30, 200817 yr

Author

I don't know what is the trouble, but i can help for the "kernel: md7: new disk" messages.

I found this message wierd too and finaly discovered it pops each time you ask for an unraid page in the Internet Explorer. If you refresh the main page for example, it will pop 1 line for each new drive that are currently clearing. So if you refresh 10 times the page to check how it goes, you will get 10 times those lines.

Very interesting, because it seems that very often after I refreshed the main web page to see the % progress the clearing process was restarted:

...

Aug 30 07:13:35 unraid1 emhttp: ... clearing 53% complete

Aug 30 07:23:22 unraid1 emhttp: ... clearing 54% complete

Aug 30 07:33:13 unraid1 emhttp: ... clearing 55% complete

Aug 30 07:38:11 unraid1 kernel: md7: new disk

Aug 30 07:38:11 unraid1 emhttp: shcmd (52): killall -w smbd nmbd

Aug 30 07:38:12 unraid1 emhttp: shcmd (53): /usr/sbin/nmbd -D

Aug 30 07:38:12 unraid1 emhttp: shcmd (54): /usr/sbin/smbd -D

Aug 30 07:38:13 unraid1 emhttp: clearing disk7 ...

Aug 30 07:38:13 unraid1 emhttp: ... clearing 1% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 2% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 3% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 4% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 5% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 6% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 7% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 8% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 9% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 10% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 11% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 12% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 13% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 14% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 15% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 16% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 17% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 18% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 19% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 20% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 21% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 22% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 23% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 24% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 25% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 26% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 27% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 28% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 29% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 30% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 31% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 32% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 33% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 34% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 35% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 36% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 37% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 38% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 39% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 40% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 41% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 42% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 43% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 44% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 45% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 46% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 47% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 48% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 49% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 50% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 51% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 52% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 53% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 54% complete

Aug 30 07:38:13 unraid1 emhttp: ... clearing 55% complete

Aug 30 07:38:14 unraid1 kernel: md7: new disk

Aug 30 07:38:17 unraid1 last message repeated 2 times

Aug 30 07:38:17 unraid1 emhttp: shcmd (55): killall -w smbd nmbd

Aug 30 07:38:18 unraid1 emhttp: shcmd (56): /usr/sbin/nmbd -D

Aug 30 07:38:18 unraid1 emhttp: shcmd (57): /usr/sbin/smbd -D

Aug 30 07:38:18 unraid1 emhttp: clearing disk7 ...

Aug 30 07:38:18 unraid1 emhttp: ... clearing 1% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 2% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 3% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 4% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 5% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 6% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 7% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 8% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 9% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 10% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 11% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 12% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 13% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 14% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 15% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 16% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 17% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 18% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 19% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 20% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 21% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 22% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 23% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 24% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 25% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 26% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 27% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 28% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 29% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 30% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 31% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 32% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 33% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 34% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 35% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 36% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 37% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 38% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 39% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 40% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 41% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 42% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 43% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 44% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 45% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 34% complete

Aug 30 07:38:18 unraid1 emhttp: ... clearing 47% complete

Aug 30 07:38:19 unraid1 emhttp: ... clearing 48% complete

Aug 30 07:38:19 unraid1 emhttp: ... clearing 49% complete

Aug 30 07:38:19 unraid1 emhttp: ... clearing 50% complete

Aug 30 07:38:19 unraid1 emhttp: ... clearing 51% complete

Aug 30 07:38:19 unraid1 emhttp: ... clearing 52% complete

Aug 30 07:38:19 unraid1 emhttp: ... clearing 53% complete

Aug 30 07:38:19 unraid1 emhttp: ... clearing 54% complete

Aug 30 07:38:19 unraid1 emhttp: ... clearing 55% complete

Aug 30 07:38:19 unraid1 kernel: md7: new disk

Aug 30 07:45:38 unraid1 emhttp: ... clearing 56% complete

Aug 30 08:01:48 unraid1 emhttp: ... clearing 57% complete

Aug 30 08:17:20 unraid1 emhttp: ... clearing 58% complete

...

But probably not really if you look at the date/time, it seems to "count" from the beginning very quickly which could explain why the page % counter was erratic and displaying smaller values suddenly.

September 1, 200817 yr

Is there any chance you are running the new unmenu tool?

Something is interfering with unRAID, and restarting the Clearing process, and not just restarting it, but adding additional work, perhaps another clearing thread. The Clearing operation reports its progress each single percent, in a very regular interval, that increases slowly (linear) as it approaches the end of the drive. These Clearing restarts in your syslogs are picking up right on schedule where it left off for the interruption and restart, but then taking 20 to 40% longer per percent. And each restart lengthens that interval by another 20 to 40% of the previous interval. This makes me wonder if it is starting a fresh Clearing thread each time, so that multiple threads are each trying to Clear and write each sector. I don't know if the Linux top command would help, in showing an increasing number of threads. Something is definitely wrong here.

September 1, 200817 yr

Is there any chance you are running the new unmenu tool?

Something is interfering with unRAID, and restarting the Clearing process, and not just restarting it, but adding additional work, perhaps another clearing thread. The Clearing operation reports its progress each single percent, in a very regular interval, that increases slowly (linear) as it approaches the end of the drive. These Clearing restarts in your syslogs are picking up right on schedule where it left off for the interruption and restart, but then taking 20 to 40% longer per percent. And each restart lengthens that interval by another 20 to 40% of the previous interval. This makes me wonder if it is starting a fresh Clearing thread each time, so that multiple threads are each trying to Clear and write each sector. I don't know if the Linux top command would help, in showing an increasing number of threads. Something is definitely wrong here.

I know for sure that it is not safe to check status using the "mdcmd status" command during the clearing of drives. So any utility that is monitoring status other than the one that Tom built into his emhttp web-interface can cause issues. The clearing process is NOT thread safe. See this thread: http://lime-technology.com/forum/index.php?topic=1569.0 The symptoms I was having were not identical, I did not crash, but never got to the "formatting" step.

Joe L.

September 2, 200817 yr

Author

Thank you RobJ and Joe L. for your answers.

I actually only used the standard web interface when I refreshed the page to check clearing progress. I think I did not telnet in parallel (or maybe I was just logged) and I'm not aware of the unmenu tool.

Some progress: I moved the WD disk to the next slot, which is the first SATA connexion of my 2 promise TX4 SATA controller. The clearing has completed. It seems there is an issue with the Gigabyte 2 ports SATA controller, while the intel ICH9R 6 ports SATA is working properly. Hopefully just a bios setting issue.

Has anone experienced this with the Gigabyte GA-EP35-DS3R motherboard?

Thx,

MA

September 3, 200817 yr

Ah Ah, i have the same board and still lots of troubles. It stops clearing or kernel panic copying every 3 or 6 hours of hard work.

I finaly unpluged everything except the ICH9R so i just kept the 6 intel sata ports. I got first a trouble then everything worked. I could make all my files copy without any more issue. I even activated the parity drive and the build parity worked like a charm, in less than 5 hours for 6 drives at nearly 60Mo/s (including parity, so 5To data).

As it seems to work better with just 6 drives, i was confused with the power supply, so i should have a new Corsair 750W tonight with only one 12v rail that can handle 62A.

But reading you, maybe the trouble is just from the linux driver of the integrated Gigabyte integrated 2 sata ports. That narrows greatly the trouble. I will try tonight (if i have the time), after changing the power supply (so now for sure it can't be the trouble) to just activate in the bios the gigabyte sata and put 2 more discs, then let the parity build. If it hangs or gets crazy, then this is it.

If the trouble is that gigabyte sata controler (this is its name, so i don't know the brand of the controler underneath, as i bet gigabyte didn't build it), i hope there is a linux fix so it can work. Because that would make 2To less and the 2 pci sata promise cards are just too slow when building/checking parity, which is quite often finaly (around 18 hours with 15 drives, so 7 on the PCI promised card).

But still sounds wierd, as this motherboard is stated working in the hardware compatibility page (i got the latest F3 bios). But it may have been tested with some other unraid older version

I will give you the result of my test when i get it

September 3, 200817 yr

But it may have been tested with some other unraid older version

Very good point.... the HCL should definitely contain that info!

September 3, 200817 yr

But it may have been tested with some other unraid older version

Very good point.... the HCL should definitely contain that info!

The trouble with the HCL is the lack of standards or expectations What defines "working"? Do I need to test every feature and every combination of hardware before I can declare it "compatible"?

I think the list is a general guideline, but we should definitely update it when we get evidence that "working" isn't universal.

Bill

September 4, 200817 yr

Just to keep you informed here (i have another thread runing about my tests to try to solve my troubles), i tried with the 8 internal sata of this motherboard (so including the 2 gigabyte ones) and it worked. I could build the parity disc without troubles. Just don't know what to think as now it works where it falled previously.

Newbie down with 4.3.3 Kernel Panic

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)