BUG: Bad page map in process shfs


Recommended Posts

I keep getting this error and all shares are not available from network. This has only happened during copying large amounts of files to a share through the network.

 

Jul 24 11:36:31 Storage_Server kernel: scsi_verify_blk_ioctl: 36 callbacks suppressed
Jul 24 11:36:31 Storage_Server kernel: hdparm: sending ioctl 2285 to a partition!
Jul 24 11:36:32 Storage_Server last message repeated 5 times
Jul 24 11:36:32 Storage_Server kernel: smartctl: sending ioctl 2285 to a partition!
Jul 24 11:36:32 Storage_Server last message repeated 3 times
Jul 24 11:37:33 Storage_Server kernel: scsi_verify_blk_ioctl: 36 callbacks suppressed
Jul 24 11:37:33 Storage_Server kernel: hdparm: sending ioctl 2285 to a partition!
Jul 24 11:37:34 Storage_Server last message repeated 5 times
Jul 24 11:37:34 Storage_Server kernel: smartctl: sending ioctl 2285 to a partition!
Jul 24 11:37:34 Storage_Server last message repeated 3 times
Jul 24 11:38:22 Storage_Server kernel: shfs[5593]: segfault at ffffffff ip ffffffff sp b3fff2d8 error 14
Jul 24 11:38:22 Storage_Server kernel: BUG: Bad page map in process shfs  pte:0a000000 pmd:5488a067
Jul 24 11:38:22 Storage_Server kernel: addr:b3fd0000 vm_flags:00100077 anon_vma:f3aca190 mapping:  (null) index:b3fd0
Jul 24 11:38:22 Storage_Server kernel: Pid: 7380, comm: shfs Not tainted 3.0.35-unRAID #2
Jul 24 11:38:22 Storage_Server kernel: Call Trace:
Jul 24 11:38:22 Storage_Server kernel:  [] print_bad_pte+0x147/0x159
Jul 24 11:38:22 Storage_Server kernel:  [] zap_pte_range+0x231/0x319
Jul 24 11:38:22 Storage_Server kernel:  [] unmap_page_range+0x14d/0x154
Jul 24 11:38:22 Storage_Server kernel:  [] unmap_vmas+0x65/0x86
Jul 24 11:38:22 Storage_Server kernel:  [] exit_mmap+0x65/0xbd
Jul 24 11:38:22 Storage_Server kernel:  [] mmput+0x1f/0x8f
Jul 24 11:38:22 Storage_Server kernel:  [] exit_mm+0xf9/0x101
Jul 24 11:38:22 Storage_Server kernel:  [] do_exit+0x1db/0x274
Jul 24 11:38:22 Storage_Server kernel:  [] ? dequeue_signal+0xa1/0x115
Jul 24 11:38:22 Storage_Server kernel:  [] do_group_exit+0x65/0x8e
Jul 24 11:38:22 Storage_Server kernel:  [] get_signal_to_deliver+0x29b/0x2ae
Jul 24 11:38:22 Storage_Server kernel:  [] do_signal+0x5a/0xeb
Jul 24 11:38:22 Storage_Server kernel:  [] ? vfs_read+0x88/0xfa
Jul 24 11:38:22 Storage_Server kernel:  [] ? do_sync_write+0xc5/0xc5
Jul 24 11:38:22 Storage_Server kernel:  [] ? vfs_writev+0x36/0x44
Jul 24 11:38:22 Storage_Server kernel:  [] do_notify_resume+0x23/0x44
Jul 24 11:38:22 Storage_Server kernel:  [] work_notifysig+0x13/0x19
Jul 24 11:38:22 Storage_Server kernel:  [] ? _cpu_down+0xc4/0x1bc
Jul 24 11:38:22 Storage_Server kernel: Disabling lock debugging due to kernel taint
Jul 24 11:38:22 Storage_Server kernel: BUG: Bad page map in process shfs  pte:22000000 pmd:5488a067
Jul 24 11:38:22 Storage_Server kernel: addr:b3fd1000 vm_flags:00100077 anon_vma:f3aca190 mapping:  (null) index:b3fd1
Jul 24 11:38:22 Storage_Server kernel: Pid: 7380, comm: shfs Tainted: G    B       3.0.35-unRAID #2
Jul 24 11:38:22 Storage_Server kernel: Call Trace:
Jul 24 11:38:22 Storage_Server kernel:  [] print_bad_pte+0x147/0x159
Jul 24 11:38:22 Storage_Server kernel:  [] zap_pte_range+0x231/0x319
Jul 24 11:38:22 Storage_Server kernel:  [] unmap_page_range+0x14d/0x154
Jul 24 11:38:22 Storage_Server kernel:  [] unmap_vmas+0x65/0x86
Jul 24 11:38:22 Storage_Server kernel:  [] exit_mmap+0x65/0xbd
Jul 24 11:38:22 Storage_Server kernel:  [] mmput+0x1f/0x8f
Jul 24 11:38:22 Storage_Server kernel:  [] exit_mm+0xf9/0x101
Jul 24 11:38:22 Storage_Server kernel:  [] do_exit+0x1db/0x274
Jul 24 11:38:22 Storage_Server kernel:  [] ? dequeue_signal+0xa1/0x115
Jul 24 11:38:22 Storage_Server kernel:  [] do_group_exit+0x65/0x8e
Jul 24 11:38:22 Storage_Server kernel:  [] get_signal_to_deliver+0x29b/0x2ae
Jul 24 11:38:22 Storage_Server kernel:  [] do_signal+0x5a/0xeb
Jul 24 11:38:22 Storage_Server kernel:  [] ? vfs_read+0x88/0xfa
Jul 24 11:38:22 Storage_Server kernel:  [] ? do_sync_write+0xc5/0xc5
Jul 24 11:38:22 Storage_Server kernel:  [] ? vfs_writev+0x36/0x44
Jul 24 11:38:22 Storage_Server kernel:  [] do_notify_resume+0x23/0x44
Jul 24 11:38:22 Storage_Server kernel:  [] work_notifysig+0x13/0x19
Jul 24 11:38:22 Storage_Server kernel:  [] ? _cpu_down+0xc4/0x1bc

Link to comment

What version of unRAID?

Have you tried running unRAID without additional plugins to troubleshoot the issue?

Have you tried performing a memtest overnight?

Also, post a FULL syslog if possible so people can see everything that went on the system leading up to the issue.

Link to comment

Running 5.0RC5... Yes I have tried running without plugins and still have the same issue... I can get the full syslog.txt if you think it will help better than the extract... I have noticed that the Disk shares are still available, while the User shares are not...

Link to comment

I'm going to restart the server and see if i can get it to reinvoke the error with a shorter syslog (no parity check). When I get it to crash again, I will leave it crashed incase someone needs me to do something whilst it is in that state.

Link to comment

Got it to crash "better" than before. Can't get any response from the system at all, not even from the machines keyboard...

 

I typed this by hand so please excuse me if there is a typo.

[<c102d0cf>] ? irq_enter+0x3c/0x3c
<IRQ> [<c102cf8d>] ? irq_exit+0x32/0x53
[<c1015d7e>] ? smp_apic_tmer_interrupt+0x6c/0x7a
[<c130f902>] ? apic_timer_interrupt+0x2a/0x30
[<c108007b>] ? wait_on_retry_sync_kiocb+0xe/0x41
[<c108fe8e>] ? d_alloc+0x74/0x14a
[<c10879db>] ? d_alloc_and_lookup+0x1f/0x4f
[<c1087eee>] ? do_lookup+0x19e/0x262
[<c10883a3>] ? link_path_walk+0x1d8/0x5e4
[<c1088b15>] ? path_lookupat+0x4c/0x4ba
[<c1088f72>] ? path_lookupat+0x4a9/0x4ba
[<c1089e39>] ? getname_flags+0x21/0xbe
[<c1088f9f>] ? do_path_lookup+0x1c/0x4e
[<c1089f14>] ? user_path_at_empty+0x3e/0x69
[<c10894fc>] ? user_path_at+0xd/0xf
[<c1083739>] ? vfs_fstatat+0x51/0x78
[<c10837a4>] ? vfs_lstat+0x16/0x18
[<c10837ba>] ? sys_lstat64+0x14/0x28
[<c10815e0>] ? __fput+0x186/0x68f
[<c10815fc>] ? fput+0x13/0x15
[<c107ece1>] ? flip_close+0x57/0x61
[<c107ed45>] ? sys_close+0x5a/0x88
[<c130f525>] ? syscall_call+0x7/0xb
[<c1300000>] ? _cpu_down+0xc4/0x1bc

 

It seems to happen when I am copying a lot of files but I have no confirmation to back this up. It is just a pattern I am noticing.

Link to comment

Another Crash, Another Log...

 

http://dcwebsupport.com/syslog2.txt

 

noticed similarities between the crashes...

 

Jul 24 11:38:22 Storage_Server kernel: shfs[5593]: segfault at ffffffff ip ffffffff sp b3fff2d8 error 14 (from first log)

 

and

 

Jul 24 18:06:18 Storage_Server kernel: shfs[4295]: segfault at 345ad3f4 ip b74c1251 sp b63d6f50 error 4 in libc-2.11.1.so[b744d000+15c000] (from second log)

 

after these seems to be when it crashes... need help!!!

Link to comment

I ran the memtest for more than 48 hours with no errors. Rebooted system and the "bug" didn't reappear, so I'm not sure if this will arise further down the track or whether it was just playing up for that day... I have now also upgraded to 5.0-rc6-r8168-test2 which seems to be running smooth so far using a M1015 flash with LSI...

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.