Server locking up - kernel: divide error: 0000


Recommended Posts

Approximately a month ago I installed a new mother board, CPU and ram (Super Micro X8DTH-iF X5650 24gb ECC) . Everything was running perfectly until a few days ago when the server locked up with kernel issues. I had to do an unclean shutdown as it wouldn't shutdown through the webgui. Syslog was gone on reboot. At the time I had been trying out the Kodi webserver Chorus which I had had issues with in the past with bogging down the server. It also may have been in the middle of running sabToSickBeard.py.

 

Parity check on reboot with no errors and server ran fine until I found the filesystem locked up again this morning. It locked in the middle of running sabToSickBeard.py. Wouldn't shut down through webgui but this time I remembered to try the powerdown command through terminal and it worked.

 

Upon reboot it mounted the disks and started a parity check but for some reason it then stopped the array and I had to start the array again from the webgui.

 

Attached are two syslogs. One from the lock up (line 2627) and the other from after rebooting.

 

If anyone could help it would be very much appreciated. I'm not sure if maybe running reiserfsck  is needed or maybe upgrading to unraid 6 due to the hardware being a bit newer would help.

 

Unraid 5.0.6

 

 

 

Mar 23 01:30:14 Leviathan kernel: divide error: 0000 [#1] SMP

Mar 23 01:30:14 Leviathan kernel: Modules linked in: ntfs md_mod coretemp sg ahci libahci acpi_cpufreq i2c_i801 mpt2sas scsi_transport_sas raid_class igb hwmon i2c_algo_bit i2c_core ptp pps_core mperf

Mar 23 01:30:14 Leviathan kernel: Pid: 27731, comm: shfs Not tainted 3.9.11p-unRAID #5 NetScout Systems, Inc. InfiniStream-69XX/X8DTH

Mar 23 01:30:14 Leviathan kernel: EIP: 0060:[<c10731d3>] EFLAGS: 00210246 CPU: 4

Mar 23 01:30:14 Leviathan kernel: EIP is at bdi_position_ratio+0x183/0x1e8

Mar 23 01:30:14 Leviathan kernel: EAX: 00000000 EBX: 000081e8 ECX: 00000002 EDX: 00000000

Mar 23 01:30:14 Leviathan kernel: ESI: 00000000 EDI: 00000000 EBP: e8eabcfc ESP: e8eabccc

Mar 23 01:30:14 Leviathan kernel:  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068

Mar 23 01:30:14 Leviathan kernel: CR0: 80050033 CR2: 40c9cf0c CR3: 28442000 CR4: 000007f0

Mar 23 01:30:14 Leviathan kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000

Mar 23 01:30:14 Leviathan kernel: DR6: ffff0ff0 DR7: 00000400

Mar 23 01:30:14 Leviathan kernel: Process shfs (pid: 27731, ti=e8eaa000 task=e8d39b00 task.ti=e8eaa000)

Mar 23 01:30:14 Leviathan kernel: Stack:

Mar 23 01:30:14 Leviathan kernel:  e8eabcec 40f60000 00000000 00000000 ffffffff 00000001 000040f6 00000000

Mar 23 01:30:14 Leviathan kernel:  fffffb08 000081df f74353d4 e8d39b00 e8eabd88 c1073be3 000081df 00000000

Mar 23 01:30:14 Leviathan kernel:  00000001 01188519 00000000 01188519 f74353f4 f743540c 00dc4501 c10f6ce0

Mar 23 01:30:14 Leviathan kernel: Call Trace:

Mar 23 01:30:14 Leviathan kernel:  [<c1073be3>] balance_dirty_pages+0x25e/0x3ff

Mar 23 01:30:14 Leviathan kernel:  [<c10f6ce0>] ? reiserfs_end_persistent_transaction+0x3d/0x44

Mar 23 01:30:14 Leviathan kernel:  [<c1073e3f>] balance_dirty_pages_ratelimited+0xbb/0xc0

Mar 23 01:30:14 Leviathan kernel:  [<c106c5b2>] generic_perform_write+0x15a/0x19d

Mar 23 01:30:14 Leviathan kernel:  [<c106c63b>] generic_file_buffered_write+0x46/0x70

Mar 23 01:30:14 Leviathan kernel:  [<c106d75d>] __generic_file_aio_write+0x36e/0x3ac

Mar 23 01:30:14 Leviathan kernel:  [<c106d804>] generic_file_aio_write+0x69/0xc2

Mar 23 01:30:14 Leviathan kernel:  [<c1096415>] do_sync_write+0x77/0xae

Mar 23 01:30:14 Leviathan kernel:  [<c10e5df1>] reiserfs_file_write+0x66/0x6e

Mar 23 01:30:14 Leviathan kernel:  [<c1096e75>] vfs_write+0x8e/0x110

Mar 23 01:30:14 Leviathan kernel:  [<c10e5d8b>] ? reiserfs_file_open+0x53/0x53

Mar 23 01:30:14 Leviathan kernel:  [<c1096f40>] sys_pwrite64+0x49/0x5f

Mar 23 01:30:14 Leviathan kernel:  [<c1401190>] syscall_call+0x7/0xb

Mar 23 01:30:14 Leviathan kernel:  [<c1400000>] ? __schedule+0x3b/0x490

Mar 23 01:30:14 Leviathan kernel: Code: 2b 45 e8 8b 75 ec 2b 75 10 8d 78 01 89 d8 0f af c6 89 45 ec 89 c8 8b 5d ec f7 e6 31 f6 01 d3 89 da 89 c3 39 fa 72 08 89 d0 31 d2 <f7> f7 89 c6 89 d8 f7 f7 89 f2 eb 1a 89 df 31 d2 c1 ff 1f c1 ff

Mar 23 01:30:14 Leviathan kernel: EIP: [<c10731d3>] bdi_position_ratio+0x183/0x1e8 SS:ESP 0068:e8eabccc

Mar 23 01:30:14 Leviathan kernel: ---[ end trace de2a2cca09293a04 ]---

Link to comment

This is a crazy one!  A divide-by-zero error is a software bug, either stupid code or bad data, unchecked for zero value before use.  It seems impossible that someone before you would have run into it before now, but as far as I know, you're the first.  No ideas, doesn't seem possible.

 

You do have an odd path above.  Is that subfolder of Downloads just a hyphen, space, and TV?  Is there possibly a bazaar character before that hyphen?  Just grasping at straws ...

Link to comment

I came across two others in the unraid forums. One pretty much the same syslog info but it doesn't appear to have been resolved.

http://lime-technology.com/forum/index.php?topic=31340.0

 

And another a bit different

http://lime-technology.com/forum/index.php?topic=32624.0

 

I've been running unraid for about 4+ years now with no major issues. Like I said I just replaced my board, cpu and ram a month ago but no issues until Sunday night when I was testing using the Kodi Chorus webserver. It tried pulling a bunch of stuff from the server is when the issue first appeared. I remember in the past trying it once before and seem to remember it causing issues but can't recall exactly what other than bogging things down. Something was also coming in at the same time from Sickbeard. Maybe it corrupted mysql a bit and it's why it popped up again when trying to add stuff to the library a few days later. Only thing I can guess at.

 

I saw several bug reports regarding older linux kernels and the issue but supposedly they were fixed. I've reiserfsck everything and am currently going the through the drives with long smart tests. Everything has come out clean so far. dgaschk advised running new permissions in that first link but I haven't done that yet. I think the plan now is to upgrade to 6, redo all my plugins, and start converting disks to XFS.

 

My biggest concern was that it was a issue with the new hardware but I haven't seen anything to indicate that's the case and you don't seem to think so either. So dunno

 

That path is just hyphen space TV. I just have the hyphen there to keep it at the top of the list so it's easy to find. It's been like that for years without issue.

Link to comment

I asked clowrym, the OP of that first divide error thread I found, over PM and he advised this:

 

"If I remember correctly, my issue was due to a failing disk #5 int he end, although nothing was showing in the smart data. A few weeks after those threads it red balled, replaced & didn't have any issue's after. I did end up replacing the power connect with an HP 24port managed switch about the same time as replacing the drive, but i truly do believe the drive was my issue. Once i removed it, I tried a pre-clear on the drive, but it wouldn't pass."

 

 

A friend more knowledgeable than myself advised he thinks there's either corruption that reiserfsck doesn't see but reiserfs can't understand and the fs will panic whenever it tries to traverse that part of the tree, or, there's a bad spot on the disk that it's choking on sporadically that will be fine ... sometimes, and puke other times.

 

So I guess I'll find out if the drive dies soon or after moving to XFS. Hopefully it's not the drive because that will make the 2nd time I buy a new 4tb with intentions of moving to XFS and having to end up using it on a dying drive instead.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.