Error when adding more then 15 drives


Recommended Posts

Has anyone had problems adding more then 15 drives? I keep getting the following error in the syslog:

 

May 15 07:13:00 Tower2 kernel: ReiserFS: md12: checking transaction log (md12)

May 15 07:13:00 Tower2 kernel: BUG: unable to handle kernel paging request at 6d614e8e

May 15 07:13:00 Tower2 kernel: IP: [<f82e525c>] xor_block+0x76/0x84 [md_mod]

May 15 07:13:00 Tower2 kernel: *pdpt = 0000000002de0001 *pde = 0000000000000000

May 15 07:13:00 Tower2 kernel: Oops: 0000 [#1] SMP

May 15 07:13:00 Tower2 kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:1c.3/0000:03:00.0/ho

st0/target0:4:0/0:4:0:0/block/sda/stat

May 15 07:13:00 Tower2 kernel: Modules linked in: md_mod ata_piix sata_promise e1000 sata_sil24 liba

ta

May 15 07:13:00 Tower2 kernel:

May 15 07:13:00 Tower2 kernel: Pid: 1906, comm: unraidd Not tainted (2.6.29.1-unRAID #2)

May 15 07:13:00 Tower2 kernel: EIP: 0060:[<f82e525c>] EFLAGS: 00010202 CPU: 0

May 15 07:13:00 Tower2 kernel: EIP is at xor_block+0x76/0x84 [md_mod]

May 15 07:13:00 Tower2 kernel: EAX: 00001000 EBX: 6d614e76 ECX: c2890000 EDX: c28fb000

May 15 07:13:00 Tower2 kernel: ESI: c2893000 EDI: c28fb000 EBP: f63cbefc ESP: f63cbedc

May 15 07:13:00 Tower2 kernel:  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068

May 15 07:13:00 Tower2 kernel: Process unraidd (pid: 1906, ti=f63ca000 task=f6fff740 task.ti=f63ca00

0)

May 15 07:13:00 Tower2 kernel: Stack:

May 15 07:13:00 Tower2 kernel:  c2892000 c2893000 c2868000 6d614e76 00001000 00000001 c28f11e0 c28f1

3a0

May 15 07:13:00 Tower2 kernel:  f63cbf3c f82e7474 00000001 00000005 00000011 00001000 00001000 f5aab

000

May 15 07:13:00 Tower2 kernel:  c28fb000 c2890000 c2892000 c2893000 c2868000 c28f1910 00000000 c28f1

1e0

May 15 07:13:00 Tower2 kernel: Call Trace:

May 15 07:13:00 Tower2 kernel:  [<f82e7474>] ? compute_parity+0x101/0x2cc [md_mod]

May 15 07:13:00 Tower2 kernel:  [<f82e7ff0>] ? handle_stripe+0x8cc/0xc5a [md_mod]

May 15 07:13:00 Tower2 kernel:  [<c0344f57>] ? schedule+0x5e3/0x63c

May 15 07:13:00 Tower2 kernel:  [<f82e8804>] ? unraidd+0x9e/0xbc [md_mod]

May 15 07:13:00 Tower2 kernel:  [<f82e8766>] ? unraidd+0x0/0xbc [md_mod]

May 15 07:13:00 Tower2 kernel:  [<c01301bc>] ? kthread+0x3b/0x63

May 15 07:13:00 Tower2 kernel:  [<c0130181>] ? kthread+0x0/0x63

May 15 07:13:00 Tower2 kernel:  [<c01035db>] ? kernel_thread_helper+0x7/0x10

May 15 07:13:00 Tower2 kernel: Code: f8 8b 71 0c 89 45 ec 75 13 56 8b 45 f0 89 d1 53 8b 5d ec 89 fa

ff 53 14 5a 59 eb 15 ff 71 10 89 d1 8b 45 f0 89 fa 56 53 8b 5d ec <ff> 53 18 83 c4 0c 8d 65 f4 5b 5e

5f 5d c3 55 89 e5 57 31 ff 56

May 15 07:13:00 Tower2 kernel: EIP: [<f82e525c>] xor_block+0x76/0x84 [md_mod] SS:ESP 0068:f63cbedc

May 15 07:13:00 Tower2 kernel: ---[ end trace 6c2b49c6389e0999 ]---

 

When this happens the main web page hangs on "mounting". Any ideas? I'm using version 4.5-beta6

Thanks

Link to comment

Joe, Thanks for the suggestion.

I ran reiserfsck and it found no corruptions on disk12. Putting the array back to 15 disks eliminates the error. It only occurs when I try to expand the system past 15 drives.

Link to comment
  • 3 weeks later...

Same issue here.

 

Array was great on 4.4final with 16 + cache for ~ 3 months, upgraded to 4.5beta6 initial migration was flawless.  However, upon adding a 17th drive system will freeze.

 

- downgraded to 4.5beta4 (initial 20 drive support)... no change

- added 17th drive, assigned to disk 18 or 19... no change

- added 17th drive to different slots on norco-4020 (on both another aoc-sat2-mv8 & x7sbe controller)... no change

- added 17th drive using 3 different wd5000abys (500GB) and a seagate 320GB... no change

- format of USB key, fresh install of 4.4beta6 (erasing unmenu, dir_cache, etc)... no change

- temp disable parity and add 1 wd5000abys... good, array completes mounting (adding parity back will freeze)

- temp disable 2 existing data drives (wd10eacs) and added 2 wd5000abys... good, array completes mounting (adding another drive will freeze)

 

I can see the syslog and looks simliar to ALR's except without the md12 issue but with a segmentation error, capturing it is proving elusive.

 

 

 

 

Link to comment

I can see the syslog and looks simliar to ALR's except without the md12 issue but with a segmentation error, capturing it is proving elusive.

 

I don't think you, or anyone else, needs to go through hardware tests.

It reads as if there is an issue in the driver. An array that is out of bounds somewhere.

Link to comment

I get this segfault as well when adding a drive but I have 16 data and 1 parity already installed and working.  It's when I add the 17th data drive that it borks.  It gets all the way through clearing the new disk tried to mount the drives and segfaults.  I sent the info to tom.

 

:(

Link to comment
  • 3 weeks later...

I think the only suggestion that can be made right now, is to wait for Tom to correct the issue.

 

Okay. I just bought 4 more drives and will wait to add them. Kinda sucks as drive prices fall rapidly, and I've been avoiding buying new stuff until I have the time to use/install it. Still, I am really looking forward to this new feature being fully implemented.

Link to comment

I think the only suggestion that can be made right now, is to wait for Tom to correct the issue.

 

Okay. I just bought 4 more drives and will wait to add them. Kinda sucks as drive prices fall rapidly, and I've been avoiding buying new stuff until I have the time to use/install it. Still, I am really looking forward to this new feature being fully implemented.

Although you must wait until you can assign them, I'd install them, and use the preclear_disk.sh script to test them and ensure they are working properly.  Then, once Tom fixes the emhttp process, you can assign them.  Hopefully, that won't be too long, but you'll be ready and you'lll be able to quickly add them.

 

Joe L.

Link to comment

I think the only suggestion that can be made right now, is to wait for Tom to correct the issue.

 

Okay. I just bought 4 more drives and will wait to add them. Kinda sucks as drive prices fall rapidly, and I've been avoiding buying new stuff until I have the time to use/install it. Still, I am really looking forward to this new feature being fully implemented.

Although you must wait until you can assign them, I'd install them, and use the preclear_disk.sh script to test them and ensure they are working properly.  Then, once Tom fixes the emhttp process, you can assign them.   Hopefully, that won't be too long, but you'll be ready and you'lll be able to quickly add them.

 

Joe L.

 

Sounds like a great idea in theory. I'll have to overcome a little phobia of getting to know what to do with these scripts first. Tomorrow is a holiday in Quebec, so I'll try to get things started then.

Link to comment
Sounds like a great idea in theory. I'll have to overcome a little phobia of getting to know what to do with these scripts first. Tomorrow is a holiday in Quebec, so I'll try to get things started then.

 

The preclear_disk script is a doddle to use.  It doesn't require any installation - you just copy it to your flash drive share, log in to unRAID's console (or telnet in), and execute the script with the target drive as the single parameter.  It even has safety checks built in to keep you from clearing a drive that's already in use.

Link to comment

I think the only suggestion that can be made right now, is to wait for Tom to correct the issue.

 

Hi all,

long time nothing heard from Tom - did anybody get info if he is aware of the 17+ drives problem and /or working on it?

Istn't there a possibility for a small hotfix?

thanks, Guzzi

Link to comment

 

The preclear_disk script is a doddle to use.  It doesn't require any installation - you just copy it to your flash drive share, log in to unRAID's console (or telnet in), and execute the script with the target drive as the single parameter.  It even has safety checks built in to keep you from clearing a drive that's already in use.

 

I also discovered (and this may already be known) that if you open more than one telnet session, you can preclear several drives at one time (one per session).

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.