AndreM Posted October 10, 2011 Share Posted October 10, 2011 Hi there, I just recently build my unRaid server, and decided to go with Unraid 5 beta12a. But I've been getting lots of kernel panic's and the server getting unresponsive. Where should I start looking? Is this related to the beta version, or is it a hardware issue? I'm using an Asus P5B motherboard, with an Adaptec 1430SA controller card, 2x 2TB hardrives, and 3x 1TB drives. (With another 4x 1TB drives I still need to pre-clear and add). Can I downgrade to 4.7 without losing my existing data? Below is the messages from syslog: Oct 10 00:13:53 pooh kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000f Oct 10 00:13:53 pooh kernel: IP: [<c10ce949>] prepare_error_buf+0x57/0x3e5 Oct 10 00:13:53 pooh kernel: *pdpt = 0000000006e51001 *pde = 0000000000000000 Oct 10 00:13:53 pooh kernel: Oops: 0000 [#1] SMP Oct 10 00:13:53 pooh kernel: Modules linked in: md_mod ntfs xor ide_gd_mod pata_jmicron asus_atk0110 r8168 hwmon sata_mv i2c_i801 i2c_core jmicron ata_piix ahci libahci [last unloaded: md_mod] Oct 10 00:13:53 pooh kernel: Oct 10 00:13:53 pooh kernel: Pid: 14317, comm: shfs Not tainted 3.0.3-unRAID #7 System manufacturer System Product Name/P5B Oct 10 00:13:53 pooh kernel: EIP: 0060:[<c10ce949>] EFLAGS: 00010286 CPU: 1 Oct 10 00:13:53 pooh kernel: EIP is at prepare_error_buf+0x57/0x3e5 Oct 10 00:13:53 pooh kernel: EAX: c14ca08a EBX: c139f34b ECX: 00000000 EDX: c14ca08a Oct 10 00:13:53 pooh kernel: ESI: ffffffff EDI: c1767e14 EBP: c1767de4 ESP: c1767d7c Oct 10 00:13:53 pooh kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Oct 10 00:13:53 pooh kernel: Process shfs (pid: 14317, ti=c1766000 task=f76a3960 task.ti=c1766000) Oct 10 00:13:53 pooh kernel: Stack: Oct 10 00:13:53 pooh kernel: c1a5b000 c1767db8 c1767e80 c1767e3c c1767e80 c1767e00 c10d4004 00000001 Oct 10 00:13:53 pooh kernel: 00000000 00001000 f76b0000 c10ce16d c14ca48a f0ad2f00 c14ca08a 00000000 Oct 10 00:13:53 pooh kernel: 00000026 c1767e28 c1767e28 c10d89bf 00000000 00000000 00000000 f76b0000 Oct 10 00:13:53 pooh kernel: Call Trace: Oct 10 00:13:53 pooh kernel: [<c10d4004>] ? search_for_position_by_key+0x32/0x24b Oct 10 00:13:53 pooh kernel: [<c10ce16d>] ? add_save_link+0x15f/0x1a6 Oct 10 00:13:53 pooh kernel: [<c10d89bf>] ? do_journal_end+0x908/0x92a Oct 10 00:13:53 pooh kernel: [<c10cf4c9>] __reiserfs_error+0x1a/0xa9 Oct 10 00:13:53 pooh kernel: [<c10d51b1>] reiserfs_do_truncate+0x15c/0x479 Oct 10 00:13:53 pooh kernel: [<c10dc75f>] ? reiserfs_for_each_xattr+0x6e/0x1fe Oct 10 00:13:53 pooh kernel: [<c10d54fc>] reiserfs_delete_object+0x2e/0x62 Oct 10 00:13:53 pooh kernel: [<c10c561a>] reiserfs_evict_inode+0x7c/0xd5 Oct 10 00:13:53 pooh kernel: [<c109022e>] evict+0x59/0xec Oct 10 00:13:53 pooh kernel: [<c1090548>] iput_final+0xea/0xef Oct 10 00:13:53 pooh kernel: [<c1090577>] iput+0x2a/0x2d Oct 10 00:13:53 pooh kernel: [<c10898b4>] do_unlinkat+0xbe/0x108 Oct 10 00:13:53 pooh kernel: [<c10881e4>] ? path_lookupat+0x16f/0x4ba Oct 10 00:13:53 pooh kernel: [<c108990e>] sys_unlink+0x10/0x12 Oct 10 00:13:53 pooh kernel: [<c130ab4d>] syscall_call+0x7/0xb Oct 10 00:13:53 pooh kernel: [<c1300000>] ? quirk_usb_disable_ehci+0x84/0x129 Oct 10 00:13:53 pooh kernel: Code: 45 d0 6c a0 4c c1 89 45 9c e9 8c 02 00 00 8b 16 8d 5e 04 8b 45 d0 89 de e8 c6 f8 ff ff e9 64 02 00 00 8d 7e 04 8b 36 85 f6 74 66 <8a> 46 10 bb 72 c1 3c c1 84 c0 74 20 3c 03 0f 84 31 03 00 00 3c Oct 10 00:13:53 pooh kernel: EIP: [<c10ce949>] prepare_error_buf+0x57/0x3e5 SS:ESP 0068:c1767d7c Oct 10 00:13:53 pooh kernel: CR2: 000000000000000f Oct 10 00:13:53 pooh kernel: ---[ end trace ffe0c3ead5183a95 ]--- Oct 10 00:13:53 pooh kernel: ------------[ cut here ]------------ Oct 10 00:13:53 pooh kernel: WARNING: at kernel/exit.c:909 do_exit+0x2c/0x274() Oct 10 00:13:53 pooh kernel: Hardware name: System Product Name Oct 10 00:13:53 pooh kernel: Modules linked in: md_mod ntfs xor ide_gd_mod pata_jmicron asus_atk0110 r8168 hwmon sata_mv i2c_i801 i2c_core jmicron ata_piix ahci libahci [last unloaded: md_mod] Oct 10 00:13:53 pooh kernel: Pid: 14317, comm: shfs Tainted: G D 3.0.3-unRAID #7 Oct 10 00:13:53 pooh kernel: Call Trace: Oct 10 00:13:53 pooh kernel: [<c10288ac>] warn_slowpath_common+0x65/0x7a Oct 10 00:13:53 pooh kernel: [<c102b724>] ? do_exit+0x2c/0x274 Oct 10 00:13:53 pooh kernel: [<c10288d0>] warn_slowpath_null+0xf/0x13 Oct 10 00:13:53 pooh kernel: [<c102b724>] do_exit+0x2c/0x274 Oct 10 00:13:53 pooh kernel: [<c10048b5>] oops_end+0x75/0x7c Oct 10 00:13:53 pooh kernel: [<c101b0c1>] no_context+0xac/0xb6 Oct 10 00:13:53 pooh kernel: [<c101b1b3>] __bad_area_nosemaphore+0xe8/0xf0 Oct 10 00:13:53 pooh kernel: [<c101b36a>] ? mm_fault_error+0x129/0x129 Oct 10 00:13:53 pooh kernel: [<c101b200>] bad_area+0x35/0x3b Oct 10 00:13:53 pooh kernel: [<c101b516>] do_page_fault+0x1ac/0x332 Oct 10 00:13:53 pooh kernel: [<c101b36a>] ? mm_fault_error+0x129/0x129 Oct 10 00:13:53 pooh kernel: [<c130b14a>] error_code+0x5a/0x60 Oct 10 00:13:53 pooh kernel: [<c101b36a>] ? mm_fault_error+0x129/0x129 Oct 10 00:13:53 pooh kernel: [<c10ce949>] ? prepare_error_buf+0x57/0x3e5 Oct 10 00:13:53 pooh kernel: [<c10d4004>] ? search_for_position_by_key+0x32/0x24b Oct 10 00:13:53 pooh kernel: [<c10ce16d>] ? add_save_link+0x15f/0x1a6 Oct 10 00:13:53 pooh kernel: [<c10d89bf>] ? do_journal_end+0x908/0x92a Oct 10 00:13:53 pooh kernel: [<c10cf4c9>] __reiserfs_error+0x1a/0xa9 Oct 10 00:13:53 pooh kernel: [<c10d51b1>] reiserfs_do_truncate+0x15c/0x479 Oct 10 00:13:53 pooh kernel: [<c10dc75f>] ? reiserfs_for_each_xattr+0x6e/0x1fe Oct 10 00:13:53 pooh kernel: [<c10d54fc>] reiserfs_delete_object+0x2e/0x62 Oct 10 00:13:53 pooh kernel: [<c10c561a>] reiserfs_evict_inode+0x7c/0xd5 Oct 10 00:13:53 pooh kernel: [<c109022e>] evict+0x59/0xec Oct 10 00:13:53 pooh kernel: [<c1090548>] iput_final+0xea/0xef Oct 10 00:13:53 pooh kernel: [<c1090577>] iput+0x2a/0x2d Oct 10 00:13:53 pooh kernel: [<c10898b4>] do_unlinkat+0xbe/0x108 Oct 10 00:13:53 pooh kernel: [<c10881e4>] ? path_lookupat+0x16f/0x4ba Oct 10 00:13:53 pooh kernel: [<c108990e>] sys_unlink+0x10/0x12 Oct 10 00:13:53 pooh kernel: [<c130ab4d>] syscall_call+0x7/0xb Oct 10 00:13:53 pooh kernel: [<c1300000>] ? quirk_usb_disable_ehci+0x84/0x129 Oct 10 00:13:53 pooh kernel: ---[ end trace ffe0c3ead5183a96 ]--- Quote Link to comment
Joe L. Posted October 10, 2011 Share Posted October 10, 2011 I've seen reiserfs file system corruption cause kernel oops in the past. I'd start with a file system check of each of your disks. (least effort, and perhaps the cure) http://lime-technology.com/wiki/index.php?title=Check_Disk_Filesystems Quote Link to comment
AndreM Posted October 10, 2011 Author Share Posted October 10, 2011 Cool, thank you for that, I'll give it a try this evening. I posted the initial post in a bit of a hurry so it was a bit light on the detail. I had an external USB drive plugged into the unRaid server, created a temp directory called /backup and then mounted the ext3 filesystem that was on the USB drive there. I then used the Telnet shell to copy files from /backup to /mnt/user/Backup (a user share I had created). It ran for a while (maybe an hour), and then the unRaid server just rebooted by itself and got stuck in the BIOS. I had to power cycle it. I then ran the memory check on the unRaid USB flash drive, and it found RAM errors. The system had 2x 2GB and 2x 1GB RAM modules. I removed the 2x 1GB (leaving the system with 4GB) and re-ran the memory test. I let it complete to 100% and it found no errors. I then unplugged the external USB drive, booted up the unRaid server, and let it finish its parity check. It found and corrected 15 errors according to the web interface. Then I tried to delete those files in the Backup user share from a Windows 7 system, and halfway through deleting it, I got that kernel oops. I rebooted the system again, and tried to delete them again, same thing happened. So it's likely that there are some filesystem corruption from the first time the machine crashed, due to possibly faulty ram. Quote Link to comment
AndreM Posted October 10, 2011 Author Share Posted October 10, 2011 This was the end of the output of reiserfsck: bad_indirect_item: block 284819480: The item (1871 1937 0x1 IND (1), len 1452, location 1128 entry count 0, fsck need 0, format new) has the bad pointer (362) to the block (370795981), which is in tree already bad_stat_data: The objectid (1938) is shared by at least two files. Can be fixed with --rebuild-tree only. bad_path: The left delimiting key [8208 8359 0x35001 IND (1)] of the node (284819480) must be equal to the first element's key [1871 1936 0x1 IND (1)] within the node. finished Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs. Fatal corruptions were found, Semantic pass skipped 1 found corruptions can be fixed only when running with --rebuild-tree Parity check is still running from this morning, so I'll let that finish before trying to fix up the reiserfs problems. Quote Link to comment
prostuff1 Posted October 10, 2011 Share Posted October 10, 2011 Run memtest for at least overnight, and 24 hours if you can. All server I build for customers I run a full 24 hours on Memtest. It is one of the easiest things to do and will save you time in the long run. Quote Link to comment
AndreM Posted October 11, 2011 Author Share Posted October 11, 2011 After finishing the reiserfsck and letting it fix the errors I have not had another kernel panic. Thanks! Quote Link to comment
AndreM Posted October 18, 2011 Author Share Posted October 18, 2011 I know I already marked this as solved, but just a follow up in case someone else has a similar issue. After repairing the filesystem I longer got the 'kernel oops' problems, but my unRaid system kept rebooting after a few hours of use. I think one of these random reboots is what caused the file system corruption in the first place. Having experienced similar issues before with a randomly rebooting system I suspected the powersupply. It was a RaidMax 630W modular powersupply. It served me well, but it was over 7 years old. I replaced it with a Corsair CX600 and the system has been rock solid since. The system currently has 7 drives in it, but is designed to take up to 12 LP/green drives. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.