Riot Posted March 23, 2016 Share Posted March 23, 2016 Approximately a month ago I installed a new mother board, CPU and ram (Super Micro X8DTH-iF X5650 24gb ECC) . Everything was running perfectly until a few days ago when the server locked up with kernel issues. I had to do an unclean shutdown as it wouldn't shutdown through the webgui. Syslog was gone on reboot. At the time I had been trying out the Kodi webserver Chorus which I had had issues with in the past with bogging down the server. It also may have been in the middle of running sabToSickBeard.py. Parity check on reboot with no errors and server ran fine until I found the filesystem locked up again this morning. It locked in the middle of running sabToSickBeard.py. Wouldn't shut down through webgui but this time I remembered to try the powerdown command through terminal and it worked. Upon reboot it mounted the disks and started a parity check but for some reason it then stopped the array and I had to start the array again from the webgui. Attached are two syslogs. One from the lock up (line 2627) and the other from after rebooting. If anyone could help it would be very much appreciated. I'm not sure if maybe running reiserfsck is needed or maybe upgrading to unraid 6 due to the hardware being a bit newer would help. Unraid 5.0.6 Mar 23 01:30:14 Leviathan kernel: divide error: 0000 [#1] SMP Mar 23 01:30:14 Leviathan kernel: Modules linked in: ntfs md_mod coretemp sg ahci libahci acpi_cpufreq i2c_i801 mpt2sas scsi_transport_sas raid_class igb hwmon i2c_algo_bit i2c_core ptp pps_core mperf Mar 23 01:30:14 Leviathan kernel: Pid: 27731, comm: shfs Not tainted 3.9.11p-unRAID #5 NetScout Systems, Inc. InfiniStream-69XX/X8DTH Mar 23 01:30:14 Leviathan kernel: EIP: 0060:[<c10731d3>] EFLAGS: 00210246 CPU: 4 Mar 23 01:30:14 Leviathan kernel: EIP is at bdi_position_ratio+0x183/0x1e8 Mar 23 01:30:14 Leviathan kernel: EAX: 00000000 EBX: 000081e8 ECX: 00000002 EDX: 00000000 Mar 23 01:30:14 Leviathan kernel: ESI: 00000000 EDI: 00000000 EBP: e8eabcfc ESP: e8eabccc Mar 23 01:30:14 Leviathan kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Mar 23 01:30:14 Leviathan kernel: CR0: 80050033 CR2: 40c9cf0c CR3: 28442000 CR4: 000007f0 Mar 23 01:30:14 Leviathan kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 Mar 23 01:30:14 Leviathan kernel: DR6: ffff0ff0 DR7: 00000400 Mar 23 01:30:14 Leviathan kernel: Process shfs (pid: 27731, ti=e8eaa000 task=e8d39b00 task.ti=e8eaa000) Mar 23 01:30:14 Leviathan kernel: Stack: Mar 23 01:30:14 Leviathan kernel: e8eabcec 40f60000 00000000 00000000 ffffffff 00000001 000040f6 00000000 Mar 23 01:30:14 Leviathan kernel: fffffb08 000081df f74353d4 e8d39b00 e8eabd88 c1073be3 000081df 00000000 Mar 23 01:30:14 Leviathan kernel: 00000001 01188519 00000000 01188519 f74353f4 f743540c 00dc4501 c10f6ce0 Mar 23 01:30:14 Leviathan kernel: Call Trace: Mar 23 01:30:14 Leviathan kernel: [<c1073be3>] balance_dirty_pages+0x25e/0x3ff Mar 23 01:30:14 Leviathan kernel: [<c10f6ce0>] ? reiserfs_end_persistent_transaction+0x3d/0x44 Mar 23 01:30:14 Leviathan kernel: [<c1073e3f>] balance_dirty_pages_ratelimited+0xbb/0xc0 Mar 23 01:30:14 Leviathan kernel: [<c106c5b2>] generic_perform_write+0x15a/0x19d Mar 23 01:30:14 Leviathan kernel: [<c106c63b>] generic_file_buffered_write+0x46/0x70 Mar 23 01:30:14 Leviathan kernel: [<c106d75d>] __generic_file_aio_write+0x36e/0x3ac Mar 23 01:30:14 Leviathan kernel: [<c106d804>] generic_file_aio_write+0x69/0xc2 Mar 23 01:30:14 Leviathan kernel: [<c1096415>] do_sync_write+0x77/0xae Mar 23 01:30:14 Leviathan kernel: [<c10e5df1>] reiserfs_file_write+0x66/0x6e Mar 23 01:30:14 Leviathan kernel: [<c1096e75>] vfs_write+0x8e/0x110 Mar 23 01:30:14 Leviathan kernel: [<c10e5d8b>] ? reiserfs_file_open+0x53/0x53 Mar 23 01:30:14 Leviathan kernel: [<c1096f40>] sys_pwrite64+0x49/0x5f Mar 23 01:30:14 Leviathan kernel: [<c1401190>] syscall_call+0x7/0xb Mar 23 01:30:14 Leviathan kernel: [<c1400000>] ? __schedule+0x3b/0x490 Mar 23 01:30:14 Leviathan kernel: Code: 2b 45 e8 8b 75 ec 2b 75 10 8d 78 01 89 d8 0f af c6 89 45 ec 89 c8 8b 5d ec f7 e6 31 f6 01 d3 89 da 89 c3 39 fa 72 08 89 d0 31 d2 <f7> f7 89 c6 89 d8 f7 f7 89 f2 eb 1a 89 df 31 d2 c1 ff 1f c1 ff Mar 23 01:30:14 Leviathan kernel: EIP: [<c10731d3>] bdi_position_ratio+0x183/0x1e8 SS:ESP 0068:e8eabccc Mar 23 01:30:14 Leviathan kernel: ---[ end trace de2a2cca09293a04 ]--- Quote Link to comment
Riot Posted March 23, 2016 Author Share Posted March 23, 2016 Ran reiserfsck on every disk except parity and flash. Came back no corruption found on all. Quote Link to comment
RobJ Posted March 24, 2016 Share Posted March 24, 2016 This is a crazy one! A divide-by-zero error is a software bug, either stupid code or bad data, unchecked for zero value before use. It seems impossible that someone before you would have run into it before now, but as far as I know, you're the first. No ideas, doesn't seem possible. You do have an odd path above. Is that subfolder of Downloads just a hyphen, space, and TV? Is there possibly a bazaar character before that hyphen? Just grasping at straws ... Quote Link to comment
Riot Posted March 24, 2016 Author Share Posted March 24, 2016 I came across two others in the unraid forums. One pretty much the same syslog info but it doesn't appear to have been resolved. http://lime-technology.com/forum/index.php?topic=31340.0 And another a bit different http://lime-technology.com/forum/index.php?topic=32624.0 I've been running unraid for about 4+ years now with no major issues. Like I said I just replaced my board, cpu and ram a month ago but no issues until Sunday night when I was testing using the Kodi Chorus webserver. It tried pulling a bunch of stuff from the server is when the issue first appeared. I remember in the past trying it once before and seem to remember it causing issues but can't recall exactly what other than bogging things down. Something was also coming in at the same time from Sickbeard. Maybe it corrupted mysql a bit and it's why it popped up again when trying to add stuff to the library a few days later. Only thing I can guess at. I saw several bug reports regarding older linux kernels and the issue but supposedly they were fixed. I've reiserfsck everything and am currently going the through the drives with long smart tests. Everything has come out clean so far. dgaschk advised running new permissions in that first link but I haven't done that yet. I think the plan now is to upgrade to 6, redo all my plugins, and start converting disks to XFS. My biggest concern was that it was a issue with the new hardware but I haven't seen anything to indicate that's the case and you don't seem to think so either. So dunno That path is just hyphen space TV. I just have the hyphen there to keep it at the top of the list so it's easy to find. It's been like that for years without issue. Quote Link to comment
Riot Posted March 25, 2016 Author Share Posted March 25, 2016 I asked clowrym, the OP of that first divide error thread I found, over PM and he advised this: "If I remember correctly, my issue was due to a failing disk #5 int he end, although nothing was showing in the smart data. A few weeks after those threads it red balled, replaced & didn't have any issue's after. I did end up replacing the power connect with an HP 24port managed switch about the same time as replacing the drive, but i truly do believe the drive was my issue. Once i removed it, I tried a pre-clear on the drive, but it wouldn't pass." A friend more knowledgeable than myself advised he thinks there's either corruption that reiserfsck doesn't see but reiserfs can't understand and the fs will panic whenever it tries to traverse that part of the tree, or, there's a bad spot on the disk that it's choking on sporadically that will be fine ... sometimes, and puke other times. So I guess I'll find out if the drive dies soon or after moving to XFS. Hopefully it's not the drive because that will make the 2nd time I buy a new 4tb with intentions of moving to XFS and having to end up using it on a dying drive instead. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.