Jump to content

[SOLVED] Data Rebuild grinds to a halt


Recommended Posts

I'm running UnRaid 5.0 as a VM on my ESXi 5.5 host with an IBM M1015 card passed through.

 

Disk 3 in my system red balled, I believe the cause was a faulty SAS cable because smartctl said the disk had no errors so I replaced the cable and followed instructions on here to re-add the drive which kicked of a data rebuild. The estimated speed at the beginning was around 50-70MB and it said it was going to take roughly 8-10 hours to complete. After a couple of hours I went back to check and the speed had dropped to 400Kb and it was going to take a 100 days to complete. I checked the syslog and noticed a bunch of errors that occurred almost an hour after the rebuild started (at 8:23).

 

an 26 07:27:11 Tower emhttp_event: disks_mounted

Jan 26 07:27:11 Tower kernel: mdcmd (45): check CORRECT

Jan 26 07:27:11 Tower kernel: md: recovery thread woken up ...

Jan 26 07:27:11 Tower kernel: md: recovery thread rebuilding disk3 ...

Jan 26 07:27:12 Tower kernel: md: using 4096k window, over a total of 1953514552 blocks.

Jan 26 07:27:12 Tower emhttp: shcmd (55): :>/etc/samba/smb-shares.conf

Jan 26 07:27:12 Tower emhttp: shcmd (56): cp /etc/netatalk/AppleVolumes.default- /etc/netatalk/AppleVolumes.default

Jan 26 07:27:13 Tower emhttp: Restart SMB...

Jan 26 07:27:13 Tower emhttp: shcmd (57): killall -HUP smbd

Jan 26 07:27:13 Tower emhttp: shcmd (58): ps axc | grep -q rpc.mountd

Jan 26 07:27:13 Tower emhttp: _shcmd: shcmd (58): exit status: 1

Jan 26 07:27:13 Tower emhttp: Restart AFP...

Jan 26 07:27:13 Tower emhttp: shcmd (59): killall -HUP afpd

Jan 26 07:27:13 Tower emhttp: shcmd (60): /usr/local/sbin/emhttp_event svcs_restarted

Jan 26 07:27:13 Tower emhttp_event: svcs_restarted

Jan 26 08:23:10 Tower kernel: BUG: unable to handle kernel NULL pointer dereference at  (null)

Jan 26 08:23:10 Tower kernel: IP: [<c124bd97>] __blk_recalc_rq_segments+0x77/0x1db

Jan 26 08:23:10 Tower kernel: *pdpt = 0000000036de7001 *pde = 0000000000000000

Jan 26 08:23:10 Tower kernel: Oops: 0002 [#1] SMP

Jan 26 08:23:10 Tower kernel: Modules linked in: md_mod sg ata_piix vmxnet3 coretemp hwmon mpt2sas scsi_transport_sas raid_class pii

x i2c_piix4 i2c_core

Jan 26 08:23:10 Tower kernel: Pid: 1869, comm: mdrecoveryd Not tainted 3.9.6p-unRAID #23 VMware, Inc. VMware Virtual Platform/440BX

Desktop Reference Platform

Jan 26 08:23:10 Tower kernel: EIP: 0060:[<c124bd97>] EFLAGS: 00010296 CPU: 0

Jan 26 08:23:10 Tower kernel: EIP is at __blk_recalc_rq_segments+0x77/0x1db

Jan 26 08:23:10 Tower kernel: EAX: f0c03f50 EBX: f0c03f00 ECX: 00000000 EDX: 00000001

Jan 26 08:23:10 Tower kernel: ESI: f51a7700 EDI: 00000000 EBP: f475dd14 ESP: f475dcc0

Jan 26 08:23:10 Tower kernel:  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068

Jan 26 08:23:10 Tower kernel: CR0: 80050033 CR2: 00000000 CR3: 36de8000 CR4: 001407f0

Jan 26 08:23:10 Tower kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000

Jan 26 08:23:10 Tower kernel: DR6: ffff0ff0 DR7: 00000400

Jan 26 08:23:10 Tower kernel: Process mdrecoveryd (pid: 1869, ti=f475c000 task=f6f54000 task.ti=f475c000)

Jan 26 08:23:10 Tower kernel: Stack:

Jan 26 08:23:10 Tower kernel:  f47710a0 f47710bc 00000000 0148eaa0 f0c03f00 00000001 00000000 00000000

Jan 26 08:23:10 Tower kernel:  00000001 00030c37 000324cd f0c03f00 00001000 00000000 f47731e0 00000000

Jan 26 08:23:10 Tower kernel:  00000000 f3c54f00 f0c03f00 00000000 f0c03f00 f475dd24 c124bf29 f3edc948

Jan 26 08:23:10 Tower kernel: Call Trace:

Jan 26 08:23:10 Tower kernel:  [<c124bf29>] blk_recount_segments+0x16/0x24

Jan 26 08:23:10 Tower kernel:  [<c124c665>] ll_back_merge_fn+0x89/0xcf

Jan 26 08:23:10 Tower kernel:  [<c1247ec6>] bio_attempt_back_merge+0x12/0x5e

Jan 26 08:23:10 Tower kernel:  [<c12493bf>] blk_queue_bio+0x96/0x237

Jan 26 08:23:10 Tower kernel:  [<c1247da5>] generic_make_request+0x7b/0xb0

Jan 26 08:23:10 Tower kernel:  [<f859ff31>] handle_stripe+0xc91/0xcf6 [md_mod]

Jan 26 08:23:10 Tower kernel:  [<f85a0086>] unraid_sync+0x3b/0x49 [md_mod]

Jan 26 08:23:10 Tower kernel:  [<f859bd0b>] md_do_sync+0x13c/0x3b3 [md_mod]

Jan 26 08:23:10 Tower kernel:  [<c103efb5>] ? wake_up_bit+0x5b/0x5b

Jan 26 08:23:10 Tower kernel:  [<c103eb75>] kthread+0x90/0x95

Jan 26 08:23:10 Tower kernel:  [<f859cb21>] ? import_device+0x166/0x166 [md_mod]

Jan 26 08:23:10 Tower kernel:  [<c1403837>] ret_from_kernel_thread+0x1b/0x28

Jan 26 08:23:10 Tower kernel:  [<c103eae5>] ? kthread_freezable_should_stop+0x4a/0x4a

Jan 26 08:23:10 Tower kernel: Code: 3c c7 45 e0 00 00 00 00 89 5d bc 89 4d b4 e9 29 01 00 00 8b 30 8b 1d 00 0d 61 c1 8b 55 e4 29 de

c1 fe 05 89 75 d0 3b b2 a0 02 00 <00> 0f 97 c2 83 7d cc 00 0f b6 d2 89 55 c8 0f 85 d1 00 00 00 85

Jan 26 08:23:10 Tower kernel: EIP: [<c124bd97>] __blk_recalc_rq_segments+0x77/0x1db SS:ESP 0068:f475dcc0

Jan 26 08:23:10 Tower kernel: CR2: 0000000000000000

Jan 26 08:23:10 Tower kernel: ---[ end trace e23426bcabb77d3b ]---

Jan 26 08:23:10 Tower kernel: ------------[ cut here ]------------

Jan 26 08:23:10 Tower kernel: WARNING: at kernel/exit.c:715 do_exit+0x48/0x2b2()

Jan 26 08:23:10 Tower kernel: Hardware name: VMware Virtual Platform

Jan 26 08:23:10 Tower kernel: Modules linked in: md_mod sg ata_piix vmxnet3 coretemp hwmon mpt2sas scsi_transport_sas raid_class pii

x i2c_piix4 i2c_core

 

I've attached my complete syslog, can someone advise what the issue might be? I have tried rebuilding the disk a number of times but it ends the same way although at different times in the rebuild process.

 

Thanks.  :)

syslog.txt

Link to comment

So I did a bit more reading on the forum and one of the posts mentioned that kernel errors are usually the result of memory issues so I ran a memtest program against my 16GB RAM and sure enough, 94 errors. So looks like i'll be replacing at least one stick.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...