June 21, 200719 yr Just a little background: When I first set up Unraid 3.0 with 2 or 3 drives, performance was decent. I'd usually see xfers ~16MBytes/Sec (IIRC). The exact xfer speed is not important however the scale will matter here in a minute. There were the occasional dips but that was about average. Over time, performance dropped to about 7-8 MB/Sec. I figured maybe it was because I added more drives. So yesterday I popped in to see if there were any updates to Unraid and lo and behold, ver 4.0. So I upgraded, which proved to be a PITA thanks to Sandisk's nefarious U3 filesystem on the USB stick. Windows would only recognize it as a CD-Rom which prevented me from renaming the volume. A little digging around and I found a U3 removal tool linked from here. I backed up the stick and the removal tool proceeded to wipe the stick along with the U3 stuff. I copied the contents back over, named the volume UNRAID and followed the upgrade instructions. Popped the stick back in the server, assigned the drives and everything seemed OK however my server became dog slow. ~1MB a second. I cat'd my syslog and snooped around a bit. It would appear like my 250gb drive is having issues. What's odd is that I never saw any indication in the web gui, error-wise. I see I'm also only running sata @ 1.5 gb/s. Is this merely just a setup issue with my Promise card? There are also some bug reports in there. I'm not sure if they mean anything as I'm not a Unix guru by any stretch. If you get a chance, please take a look. Thanks, Ryan
June 21, 200719 yr The disk errors resulted in DMA mode being turned off on your parity drive. That is why you are running slow. Turning off DMA mode allowed your disk to run without additional errors, but slowly. These things happen under windows too, but most people never notice the performance drop (and the windows OS gets re-booted frequently anyway) A re-boot will fix the speed and re-enable DMA mode... If the errors continue try a different cable to your hda/hdb drive pair. Make sure to use a high-quality flat, 80 conductor cable. Do not use a round cable. (they can cause many errors) It also seemed to have caused a kernel hiccup. Jun 20 22:42:11 (none) kernel: [ 463.857777] ide: failed opcode was: unknown Jun 20 22:42:11 (none) kernel: [ 463.857781] [b]hda: DMA disabled[/b] Jun 20 22:42:11 (none) kernel: [ 463.898983] ide0: reset: success Jun 20 22:43:35 (none) kernel: [ 547.897726] ------------[ cut here ]------------ Jun 20 22:43:35 (none) kernel: [ 547.897803] kernel BUG at fs/reiserfs/bitmap.c:1287! Jun 20 22:43:35 (none) kernel: [ 547.897874] invalid opcode: 0000 [#1] Jun 20 22:43:35 (none) kernel: [ 547.897941] Modules linked in: md_mod skge sata_promise ata_piix l ibata Jun 20 22:43:35 (none) kernel: [ 547.898191] CPU: 0 Jun 20 22:43:35 (none) kernel: [ 547.898192] EIP: 0060:[<c0175bda>] Not tainted VLI Jun 20 22:43:35 (none) kernel: [ 547.898193] EFLAGS: 00010246 (2.6.20 #37) Jun 20 22:43:35 (none) kernel: [ 547.898400] EIP is at reiserfs_cache_bitmap_metadata+0x69/0x71 Jun 20 22:43:35 (none) kernel: [ 547.898471] eax: f2db5000 ebx: f883ed0c ecx: ffffffff edx: f 2db4ffc Jun 20 22:43:35 (none) kernel: [ 547.898544] esi: f0addb70 edi: 00000000 ebp: f883ed0c esp: f 1917c1c Jun 20 22:43:35 (none) kernel: [ 547.898617] ds: 007b es: 007b ss: 0068 Jun 20 22:43:35 (none) kernel: [ 547.898686] Process smbd (pid: 1135, ti=f1916000 task=de520070 tas k.ti=f1916000) Jun 20 22:43:35 (none) kernel: [ 547.898758] Stack: f0addb70 03a18000 dfe26800 c0175c92 00001000 f1 917ec4 f0ce007b c161007b Jun 20 22:43:35 (none) kernel: [ 547.899127] 00000743 f1917cac 00000e8e dfe26800 c01745e7 00 000002 f0ce02b0 04040404 Jun 20 22:43:35 (none) kernel: [ 547.899494] 00000000 f883ed0c 00000743 f1917ec4 542e1a94 3e 846bff 00000743 dfe26800 Jun 20 22:43:35 (none) kernel: [ 547.899861] Call Trace: Jun 20 22:43:35 (none) kernel: [ 547.899989] [<c0175c92>] reiserfs_read_bitmap_block+0xb0/0xba Jun 20 22:43:35 (none) kernel: [ 547.900091] [<c01745e7>] scan_bitmap_block+0x63/0x227 Jun 20 22:43:35 (none) kernel: [ 547.900192] [<c0174a13>] scan_bitmap+0x1a2/0x1fb Jun 20 22:43:35 (none) kernel: [ 547.900292] [<c0175a2a>] reiserfs_allocate_blocknrs+0x2cd/0x3d4 Jun 20 22:43:35 (none) kernel: [ 547.900394] [<c017f82b>] reiserfs_allocate_blocks_for_region+0x1f 1/0x1156 Jun 20 22:43:35 (none) kernel: [ 547.900497] [<c01c1062>] radix_tree_node_alloc+0x16/0x4f Jun 20 22:43:35 (none) kernel: [ 547.900600] [<c01c11ce>] radix_tree_insert+0x59/0xfb Jun 20 22:43:35 (none) kernel: [ 547.900700] [<c0161d83>] alloc_buffer_head+0x1e/0x22 Jun 20 22:43:36 (none) kernel: [ 547.900800] [<c01600b3>] create_empty_buffers+0x10/0x64 Jun 20 22:43:36 (none) kernel: [ 547.900901] [<c0180ea2>] reiserfs_prepare_file_region_for_write+0 x10e/0x77b Jun 20 22:43:36 (none) kernel: [ 547.901004] [<c028be59>] skb_copy_datagram_iovec+0x3b/0x1cc Jun 20 22:43:36 (none) kernel: [ 547.901105] [<c0288c43>] release_sock+0x9/0x45 Jun 20 22:43:36 (none) kernel: [ 547.901206] [<c0181869>] reiserfs_file_write+0x35a/0x4b9 Jun 20 22:43:36 (none) kernel: [ 547.901307] [<c0149b0b>] cp_new_stat64+0xf6/0x108 Jun 20 22:43:36 (none) kernel: [ 547.901408] [<c0149b81>] sys_fstat64+0x1e/0x23 Jun 20 22:43:36 (none) kernel: [ 547.901508] [<c0147342>] vfs_write+0x8b/0x130 Jun 20 22:43:36 (none) kernel: [ 547.901606] [<c014755c>] sys_pwrite64+0x48/0x5f Jun 20 22:43:36 (none) kernel: [ 547.901705] [<c0102aa4>] syscall_call+0x7/0xb Jun 20 22:43:36 (none) kernel: [ 547.901805] ======================= Jun 20 22:43:36 (none) kernel: [ 547.901872] Code: 66 89 3b eb db 40 74 d8 b9 1f 00 00 00 0f a3 0a 19 c0 85 c0 75 0a 8d 04 0f 66 ff 43 02 66 89 03 49 79 ea eb bb 66 83 3b 00 75 04 <0f> 0b eb fe 5b 5e 5f c3 55 89 d1 57 89 c7 56 53 83 ec 10 8b 58 Jun 20 22:43:36 (none) kernel: [ 547.903905] EIP: [<c0175bda>] reiserfs_cache_bitmap_metadata+0x69/ 0x71 SS:ESP 0068:f1917c1c Jun 20 22:43:36 (none) kernel: [ 547.904072] BUG: at kernel/exit.c:860 do_exit() Jun 20 22:43:36 (none) kernel: [ 547.904177] [<c0117a4e>] do_exit+0x42/0x2fb Jun 20 22:43:36 (none) kernel: [ 547.904277] [<c0103c55>] die+0x197/0x19f Jun 20 22:43:36 (none) kernel: [ 547.904377] [<c0103f00>] do_invalid_op+0x0/0x99 Jun 20 22:43:36 (none) kernel: [ 547.904476] [<c0103f90>] do_invalid_op+0x90/0x99 Jun 20 22:43:36 (none) kernel: [ 547.904575] [<c0175bda>] reiserfs_cache_bitmap_metadata+0x69/0x71 Jun 20 22:43:36 (none) kernel: [ 547.904678] [<c02d1d7b>] io_schedule+0xe/0x16 Jun 20 22:43:36 (none) kernel: [ 547.904778] [<c02d1ea3>] __wait_on_bit+0x4a/0x51 Jun 20 22:43:36 (none) kernel: [ 547.904877] [<c02d1f19>] out_of_line_wait_on_bit+0x6f/0x77 Jun 20 22:43:36 (none) kernel: [ 547.904977] [<c015f01f>] sync_buffer+0x0/0x2e Jun 20 22:43:36 (none) kernel: [ 547.905076] [<c01247cc>] wake_bit_function+0x0/0x3c Jun 20 22:43:36 (none) kernel: [ 547.905177] [<c02d26b4>] error_code+0x74/0x7c Jun 20 22:43:36 (none) kernel: [ 547.905277] [<f883007b>] ata_bmdma_drive_eh+0x8f/0xf4 [libata] Jun 20 22:43:36 (none) kernel: [ 547.905390] [<c0175bda>] reiserfs_cache_bitmap_metadata+0x69/0x71 Jun 20 22:43:36 (none) kernel: [ 547.905492] [<c0175c92>] reiserfs_read_bitmap_block+0xb0/0xba Jun 20 22:43:36 (none) kernel: [ 547.905593] [<c01745e7>] scan_bitmap_block+0x63/0x227 Jun 20 22:43:36 (none) kernel: [ 547.905693] [<c0174a13>] scan_bitmap+0x1a2/0x1fb Jun 20 22:43:36 (none) kernel: [ 547.905793] [<c0175a2a>] reiserfs_allocate_blocknrs+0x2cd/0x3d4 Jun 20 22:43:36 (none) kernel: [ 547.905895] [<c017f82b>] reiserfs_allocate_blocks_for_region+0x1f 1/0x1156 Jun 20 22:43:36 (none) kernel: [ 547.905998] [<c01c1062>] radix_tree_node_alloc+0x16/0x4f Jun 20 22:43:36 (none) kernel: [ 547.906098] [<c01c11ce>] radix_tree_insert+0x59/0xfb Jun 20 22:43:36 (none) kernel: [ 547.906198] [<c0161d83>] alloc_buffer_head+0x1e/0x22 Jun 20 22:43:36 (none) kernel: [ 547.906297] [<c01600b3>] create_empty_buffers+0x10/0x64 Jun 20 22:43:36 (none) kernel: [ 547.906397] [<c0180ea2>] reiserfs_prepare_file_region_for_write+0 x10e/0x77b Jun 20 22:43:36 (none) kernel: [ 547.906499] [<c028be59>] skb_copy_datagram_iovec+0x3b/0x1cc Jun 20 22:43:36 (none) kernel: [ 547.906600] [<c0288c43>] release_sock+0x9/0x45 Jun 20 22:43:36 (none) kernel: [ 547.906700] [<c0181869>] reiserfs_file_write+0x35a/0x4b9 Jun 20 22:43:36 (none) kernel: [ 547.906800] [<c0149b0b>] cp_new_stat64+0xf6/0x108 Jun 20 22:43:36 (none) kernel: [ 547.906901] [<c0149b81>] sys_fstat64+0x1e/0x23 Jun 20 22:43:36 (none) kernel: [ 547.906999] [<c0147342>] vfs_write+0x8b/0x130 Jun 20 22:43:36 (none) kernel: [ 547.907098] [<c014755c>] sys_pwrite64+0x48/0x5f Jun 20 22:43:36 (none) kernel: [ 547.907196] [<c0102aa4>] syscall_call+0x7/0xb Jun 20 22:43:36 (none) kernel: [ 547.907295] =======================
June 21, 200719 yr Author Joe - Thanks for looking at the log. I don't think it's the parity drive causing the hiccup though as it's not an IDE drive. The parity drive is a Seagate 500GB SATA. I did a "hdparm -i /dev/hda" from the command line and it comes back as my 200GB data drive. As an update to the former situation, things are getting worse. I cannot write to disk7. Any attempts to write will cause a hang (I use Windows (total) Commander for file mgmt). Eventually, it will return a Disk full message even though there is 4.6 GB left of space. The other drives *appear* good but all I've done are reads on them. Also I cannot shut the server down properly. Using the gui fails to stop the array. I'm guessing Samba get's shutdown in the process as the gui won't refresh. If I telnet in, it will let me unmount most of the drives save one. It comes back as "busy". No amount of coaxing will unmount it. I tried to issue a couple "kill -9 pid's" but the processes won't go away. I really hate just flipping a switch. This has happened on 4 occasions now. Twice under ver 3.0, twice under 4.0 now. I just got another 500GB drive in today (Hence why I was shutting down). I'll take a peek and see if I can't find any loose cables. Any help would be greatly appreciated! Ryan P.S. Attached is the most recent syslog
June 21, 200719 yr No problem... most of your errors seemed to be on hdb, not hda, although hda had DMA shut down... Both are on the same cable... might be either... or the cable...or the disk controller card... I'm still used to the PATA naming where hda was the parity drive... in any case, good luck.
June 22, 200719 yr It appears there may be a file system corruption on one of your data disks. Refer to this page on the wiki for instructions on how to check and fix that. This type of "corruption" seems to only happen after upgrading from 3.x to 4.0. I've been spending some time trying to track down this problem without a clear explanation yet. I suspect it's not really corruption in the sense of data being lost because we haven't had any reports yet of data being lost. Probably has to do with reiserfs in linux 2.6 kernel not being able to cope with some quirk in the file system created under linux 2.4 kernel. unRAID ver 4.0 is based on linux 2.6.20. Next unRAID ver 4.1 will be based on 2.6.21, and there have been reiserfs changes.
June 24, 200719 yr Author I repaired the corruption on disk 1 & 7. Disk 1 required the "-rebuild tree" option but it straightened itself out. Disk 7 was fixed with the "-fix fixable" option. The rest of the drives came back OK according to the standard reiserfsck. I figured I would go ahead and run it on the parity drive if there was no objection. One thing that concerns me though is the drop in write performance. I used to get ~15,000-16,000 kbytes/sec on average. Recently, I've been averaging ~ 4000-5000 kbytes/sec. The last three drives I've bought were sata 3.0 ,500 gb, 16MB cache drives (a total of 9 drives now). Am I seeing a bottleneck because of the number of drives in the array? Read performance is great, I've been streaming HDDVD's with no stutters Just a suggestion for the next version, is there a way you could integrate a maintanance tab into the next gui? It would be nice if we could run (schedule?) the occasional resiserfsck from there to keep tabs on the drives. Maybe dump the output to a file viewable in the gui as well. Thanks, Ryan
June 25, 200719 yr Ryan, Off Topic - Sorry How are you able to stream HDDVD's......I can only assume that your running a gigabit network. I have a 100M network and can't get Power DVD to stream but can with local drives no problem.
June 25, 200719 yr Author Ryan, Off Topic - Sorry How are you able to stream HDDVD's......I can only assume that your running a gigabit network. I have a 100M network and can't get Power DVD to stream but can with local drives no problem. Yes, Im running gigabit bewtween my HTPC and Unraid Server. No switches, just a crossover cable in between. Bandwidth doesn't *appear* to be an issue, at least on the read side. Take Care, Ryan
July 4, 200719 yr I repaired the corruption on disk 1 & 7. Disk 1 required the "-rebuild tree" option but it straightened itself out. Disk 7 was fixed with the "-fix fixable" option. The rest of the drives came back OK according to the standard reiserfsck. I figured I would go ahead and run it on the parity drive if there was no objection. The Parity drive contains no file system - no point in running reiserfsck on it. One thing that concerns me though is the drop in write performance. I used to get ~15,000-16,000 kbytes/sec on average. Recently, I've been averaging ~ 4000-5000 kbytes/sec. The last three drives I've bought were sata 3.0 ,500 gb, 16MB cache drives (a total of 9 drives now). Am I seeing a bottleneck because of the number of drives in the array? Read performance is great, I've been streaming HDDVD's with no stutters Yes 4-5 MB/sec is slow. How are you measuring it? Just a suggestion for the next version, is there a way you could integrate a maintanance tab into the next gui? It would be nice if we could run (schedule?) the occasional resiserfsck from there to keep tabs on the drives. Maybe dump the output to a file viewable in the gui as well. In the works.
July 5, 200718 yr Just a suggestion for the next version, is there a way you could integrate a maintanance tab into the next gui? It would be nice if we could run (schedule?) the occasional resiserfsck from there to keep tabs on the drives. Maybe dump the output to a file viewable in the gui as well. In the works. Yippee...
August 26, 200718 yr Ah ha, seems I have both the kernel bug and the Reiser corruption on my system as well! I'm going to try to follow the Wiki again now that I've completed a parity check (no errors). Previous attempts came back saying the md1 didn't exist. Rigth now it accepted the command - w00t! I will nto try to check all of my disks <shiver> Will I need to do this on a regular basis? My situation so far -> http://lime-technology.com/forum/index.php?topic=924.0
August 26, 200718 yr Ah ha, seems I have both the kernel bug and the Reiser corruption on my system as well! I'm going to try to follow the Wiki again now that I've completed a parity check (no errors). Previous attempts came back saying the md1 didn't exist. Rigth now it accepted the command - w00t! I will nto try to check all of my disks <shiver> Will I need to do this on a regular basis? My situation so far -> http://lime-technology.com/forum/index.php?topic=924.0 This suggestion has been given by Tom to those using 4.1 and having network problems. Although you are not having network problems you might want to try the same fix. Basically, Tom has said all prior releases had this option enabled. He thought it was stable enough and did not enable it on the 4.1 version. It modifies how interrupts are handled. I can see how interrupt conflicts could cause all sorts of weird issues, so it is worth a try. His suggestion was to try this: Edit the file 'syslinux.cfg' on the Flash device. On the line that reads: append initrd=bzroot rootdelay=10 change to append initrd=bzroot rootdelay=10 nolapic Then reboot your server. It seems to have fixed the problems where people are losing their network connections. Who knows, it might just get you going too. Joe L.
August 26, 200718 yr Think my problem might be deeper, going to reply in other thread as it has more info. Issues ehre sound similiar but I think I might need Tom for this one. Already lost data once upon a time - on tweo disks dammit - by dorking aorund. Fool me once....
Archived
This topic is now archived and is closed to further replies.