jus7incase Posted May 28, 2015 Author Share Posted May 28, 2015 It's totally normal for the 8TB Archive. Both mine have similarly high numbers after 22 days. Perhaps you should only give advice when you know what you're talking about, and not lead people on a wild goose chase? Yeah, right... IN FACT that disk makes his system to go kernel panic. Smart ID #7 is BAD even for the disk firmware: a worst value touched of 60 (on a 100 basis) with a SMART BAD low limit fixed @ 30 for a 22 days disk is... BAD. IMHO. Sentence. Or probably this is the reason because I don't go 4 Seagates... never. After having dumped SMART reports for all drives I can tell that all Seagates report similar numbers. It seems to be normal for Seagate consumer grade drives (not saying that this is good, though). Also note that the numbers for WD and Samsung drives in my system are farther away from the failure threshold. Quote Link to comment
HellDiverUK Posted May 28, 2015 Share Posted May 28, 2015 From observing my 8TB Archive drives, and having one report a SMART failure which has now gone, I think these drives throw a failure when the following scenario happens: 1. a lot of small files are copied to the drive in one lump (my test is 60GB of MP3 files) 2. the drive starts shingling 3. the drive is interrupted during the shingling procedure on several occasions (by rebooting the machine) 4. drive throws a SMART failure If I leave the drive idle for 5-10 minutes until the accessing noises stop (IE shingling has finished), I can reboot the machine as many times and as frequently as I want with no SMART failure If I don't let the drive idle and therefore not have a chance to finish it's housekeeping, and then do the same number of reboots, the drive will crap out. If I leave the drive running with the PC's BIOS "SMART failure" error on the screen, about 20 minutes of accessing noises can be observed, and once that is done and a reset performed the drive works perfectly. These drives are designed to be run 24/7, and they really should be run that way, because shingling (housekeeping) shouldn't be interrupted. I also think the SMART figures are odd because they're reporting something different compared to a traditional PMR drive. Quote Link to comment
jus7incase Posted May 28, 2015 Author Share Posted May 28, 2015 Folks, I narrowed the kernel panic now definitely down to the mover. I scheduled it and monitored the syslog with tail -f. Here is what I could capture per telnet and on the console. After that the machine was unresponsive. For me, alas the error dump information that follows at the bottom is so far opaque. If anyonecan make sense of this, please go ahead. For the record: The faulty 3TB drive I reported above is not allocated to the array, but connected to Supermicro board on SATA3 connector. 8TB Parity drive connected to SuperMicro board on SATA3, 250GB Samsung Cache drive connected to SATA2 on SuperMicro board. 8TB data drive connected to SuperMicro on SATA2. Rest of drives connected to SUpermicro SATA2 and to Digitus controller. I would assume the mover would copy from the 250Gb Cache drive to the new 8TB data drive, thereby also accessing the 8TB parity drive. Boom. May 28 20:41:00 unRAID kernel: df7d1440 f3671f00 df7d1440 c82e3d50 c10a16aa f0ee8380 df7d1710 c82e3da4 May 28 20:41:00 unRAID kernel: c102acca 00000000 00000009 00000246 c82e3ec0 c82e3d88 00000001 00000000 May 28 20:41:00 unRAID kernel: Call Trace: May 28 20:41:00 unRAID kernel: [<c10a1604>] put_files_struct+0x55/0x8e May 28 20:41:00 unRAID kernel: [<c10a16aa>] exit_files+0x34/0x38 May 28 20:41:00 unRAID kernel: [<c102acca>] do_exit+0x2c7/0x73e May 28 20:41:00 unRAID kernel: [<c1027090>] ? print_oops_end_marker+0x2a/0x2c May 28 20:41:00 unRAID kernel: [<c1004c3a>] oops_end+0x79/0x7e May 28 20:41:00 unRAID kernel: [<c13df945>] no_context+0x1ad/0x1b5 May 28 20:41:00 unRAID kernel: [<c13dfbd9>] __bad_area_nosemaphore+0x125/0x12d May 28 20:41:00 unRAID kernel: [<c13dfc69>] bad_area+0x37/0x3d May 28 20:41:00 unRAID kernel: [<c10209d7>] __do_page_fault+0x1bf/0x391 May 28 20:41:00 unRAID kernel: [<c104466f>] ? sched_clock_cpu+0x3f/0x15e May 28 20:41:00 unRAID kernel: [<c1020bae>] ? vmalloc_sync_all+0x5/0x5 May 28 20:41:00 unRAID kernel: [<c1020bb6>] do_page_fault+0x8/0xa May 28 20:41:00 unRAID kernel: [<c13e63ea>] error_code+0x5a/0x60 May 28 20:41:00 unRAID kernel: [<c104007b>] ? clean_sort_range+0x11/0xc8 May 28 20:41:00 unRAID kernel: [<c10a007b>] ? iput+0x67/0xe5 May 28 20:41:00 unRAID kernel: [<c1020bae>] ? vmalloc_sync_all+0x5/0x5 May 28 20:41:00 unRAID kernel: [<c10a14fa>] ? dup_fd+0x13d/0x1ca May 28 20:41:00 unRAID kernel: [<c1025d8a>] copy_process.part.60+0x3db/0xd1d May 28 20:41:00 unRAID kernel: [<c10267a4>] do_fork+0xbb/0x20d May 28 20:41:00 unRAID kernel: [<c102698d>] sys_clone+0x20/0x22 May 28 20:41:00 unRAID kernel: [<c13e5f40>] syscall_call+0x7/0xb May 28 20:41:00 unRAID kernel: Code: 00 8d 90 01 02 00 00 83 fa 01 77 07 b8 fc ff ff ff eb 0d 89 c2 83 e2 fd 81 fa fc fd ff ff 74 ec 5d c3 55 89 e5 57 56 53 89 c3 51 <8b> 40 20 85 c0 75 10 c7 04 24 42 c6 49 c1 31 f6 e8 b6 2c 35 00 May 28 20:41:00 unRAID kernel: EIP: [<c108d19d>] filp_close+0x9/0x61 SS:ESP 0068:c82e3d14 May 28 20:41:00 unRAID kernel: CR2: 0000000073750062 May 28 20:41:00 unRAID kernel: ---[ end trace 1d6206f196c7d87a ]--- May 28 20:41:00 unRAID kernel: Fixing recursive fault but reboot is needed! Message from syslogd@unRAID at Thu May 28 20:41:00 2015 ... unRAID kernel: Call Trace: Message from syslogd@unRAID at Thu May 28 20:41:00 2015 ... unRAID kernel: Stack: Message from syslogd@unRAID at Thu May 28 20:41:00 2015 ... unRAID kernel: Process find (pid: 12432, ti=c82e2000 task=df7d1440 task.ti=c82e2000) Message from syslogd@unRAID at Thu May 28 20:41:00 2015 ... unRAID kernel: Code: 00 00 00 89 55 dc 99 f7 f9 8b 55 e0 8b 72 0c 89 c1 f3 a4 89 c1 8b 72 08 31 d2 8b 7b 08 f3 a4 eb 1d 8b 4d dc 8b 04 91 85 c0 74 06 <f0> ff 40 20 eb 06 8b 4b 0c 0f b3 11 8b 4d e4 89 04 91 42 3b 55 Message from syslogd@unRAID at Thu May 28 20:41:00 2015 ... unRAID kernel: Call Trace: Message from syslogd@unRAID at Thu May 28 20:41:00 2015 ... unRAID kernel: EIP: [<c108d19d>] filp_close+0x9/0x61 SS:ESP 0068:c82e3d14 Message from syslogd@unRAID at Thu May 28 20:41:00 2015 ... unRAID kernel: EIP: [<c10a14fa>] dup_fd+0x13d/0x1ca SS:ESP 0068:c82e3efc Message from syslogd@unRAID at Thu May 28 20:41:00 2015 ... unRAID kernel: Stack: Message from syslogd@unRAID at Thu May 28 20:41:00 2015 ... unRAID kernel: Process find (pid: 12432, ti=c82e2000 task=df7d1440 task.ti=c82e2000) Message from syslogd@unRAID at Thu May 28 20:41:00 2015 ... unRAID kernel: Code: 00 8d 90 01 02 00 00 83 fa 01 77 07 b8 fc ff ff ff eb 0d 89 c2 83 e2 fd 81 fa fc fd ff ff 74 ec 5d c3 55 89 e5 57 56 53 89 c3 51 <8b> 40 20 85 c0 75 10 c7 04 24 42 c6 49 c1 31 f6 e8 b6 2c 35 00 Quote Link to comment
jus7incase Posted May 29, 2015 Author Share Posted May 29, 2015 Since the log above may not be very heplful I tried to reproduce the problem and capture more of the log. I found the following stuff in the log of the mover: rsync: write failed on "/mnt/user0/download/sabnzbd/complete/couchpotato/.DS_Store": No space left on device (28) rsync error: error in file IO (code 11) at receiver.c(302) [receiver=3.0.7] rsync: connection unexpectedly closed (29 bytes received so far) [sender] rsync error: error in rsync protocol data stream (code 12) at io.c(601) [sender=3.0.7] After that it goes on though and when copying a large file it breaks down like this (snap from syslog): May 29 12:02:15 unRAID shfs/user0: shfs_write: write: (28) No space left on device May 29 12:02:18 unRAID rsync: *** glibc detected *** rsync: free(): invalid next size (normal): 0x080d2450 *** May 29 12:02:19 unRAID rsync: *** glibc detected *** rsync: free(): invalid next size (normal): 0x080d2450 *** May 29 12:02:20 unRAID rsync: *** glibc detected *** rsync: free(): invalid next size (normal): 0x080d2458 *** May 29 12:02:21 unRAID rsync: *** glibc detected *** rsync: free(): invalid next size (normal): 0x080d2450 *** May 29 12:02:22 unRAID rsync: *** glibc detected *** rsync: free(): invalid next size (normal): 0x080d24a8 *** May 29 12:02:23 unRAID kernel: BUG: unable to handle kernel paging request at 2e6b6c89 May 29 12:02:23 unRAID kernel: IP: [<c10c9931>] tid_fd_revalidate+0x56/0x12a May 29 12:02:23 unRAID kernel: *pdpt = 0000000030ff3001 *pde = 0000000000000000 May 29 12:02:23 unRAID kernel: Oops: 0000 [#1] SMP May 29 12:02:23 unRAID kernel: Modules linked in: md_mod sit2fe(O) m88ds3103(O) cx25840(O) sg cx23885(O) rc_core(O) videobuf_dma_sg(O) snd_pcm snd_timer snd_page_alloc cx2341x(O) v4l2_common(O) i2c_i801 ahci libahci coretemp hwmon videodev(O) tda18271(O) snd soundcore videobuf_dvb(O) e1000e dvb_core(O) videobuf_core(O) btcx_risc(O) ptp tveeprom(O) i2c_core pps_core [last unloaded: md_mod] May 29 12:02:23 unRAID kernel: Pid: 4008, comm: fuser Tainted: G O 3.9.6p-unRAID #1 Supermicro X9SCL/X9SCM/X9SCL/X9SCM May 29 12:02:23 unRAID kernel: EIP: 0060:[<c10c9931>] EFLAGS: 00010202 CPU: 1 May 29 12:02:23 unRAID kernel: EIP is at tid_fd_revalidate+0x56/0x12a May 29 12:02:23 unRAID kernel: EAX: f740a300 EBX: f3498000 ECX: f0eda000 EDX: 2e6b6c61 May 29 12:02:23 unRAID kernel: ESI: eeea54c0 EDI: eeea0380 EBP: f71d1ebc ESP: f71d1eac May 29 12:02:23 unRAID kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 May 29 12:02:23 unRAID kernel: CR0: 80050033 CR2: 2e6b6c89 CR3: 3771a000 CR4: 000407f0 May 29 12:02:23 unRAID kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 May 29 12:02:23 unRAID kernel: DR6: ffff0ff0 DR7: 00000400 May 29 12:02:23 unRAID kernel: Process fuser (pid: 4008, ti=f71d0000 task=f34990e0 task.ti=f71d0000) May 29 12:02:23 unRAID kernel: Stack: May 29 12:02:23 unRAID kernel: 00000004 eeea54c0 eeea0380 f72979c0 f71d1ecc c10c9ad1 eed57680 eeea0380 May 29 12:02:23 unRAID kernel: f71d1f0c c10c6de0 00000004 0000000d c14bcfba f71d1f3b f71d1f48 f71d1f3c May 29 12:02:23 unRAID kernel: c109ac64 f71d1f90 00000034 00000001 f71d1f3b f72979c0 00000004 f740a300 May 29 12:02:23 unRAID kernel: Call Trace: May 29 12:02:23 unRAID kernel: [<c10c9ad1>] proc_fd_instantiate+0x6a/0x74 May 29 12:02:23 unRAID kernel: [<c10c6de0>] proc_fill_cache+0x66/0xf9 May 29 12:02:23 unRAID kernel: [<c109ac64>] ? sys_ioctl+0x50/0x50 May 29 12:02:23 unRAID kernel: [<c10c9756>] proc_readfd_common+0x15d/0x1a4 May 29 12:02:23 unRAID kernel: [<c10c9a67>] ? proc_fdinfo_instantiate+0x62/0x62 May 29 12:02:23 unRAID kernel: [<c109ac64>] ? sys_ioctl+0x50/0x50 May 29 12:02:23 unRAID kernel: [<c1096258>] ? final_putname+0x2d/0x30 May 29 12:02:23 unRAID kernel: [<c10c97c3>] proc_readfd+0x12/0x14 May 29 12:02:23 unRAID kernel: [<c10c9a67>] ? proc_fdinfo_instantiate+0x62/0x62 May 29 12:02:23 unRAID kernel: [<c109af56>] vfs_readdir+0x52/0x7a May 29 12:02:23 unRAID kernel: [<c109ac64>] ? sys_ioctl+0x50/0x50 May 29 12:02:23 unRAID kernel: [<c109b0e8>] sys_getdents64+0x62/0xba May 29 12:02:23 unRAID kernel: [<c13e5f40>] syscall_call+0x7/0xb May 29 12:02:23 unRAID kernel: Code: 00 89 55 f0 e8 78 7c fd ff 8b 55 f0 85 c0 0f 84 98 00 00 00 8b 48 04 3b 11 0f 83 88 00 00 00 8b 49 04 8d 14 91 8b 12 85 d2 74 7c <8b> 7a 28 e8 76 7c fd ff 8d 83 d0 02 00 00 e8 84 c2 31 00 8b 83 May 29 12:02:23 unRAID kernel: EIP: [<c10c9931>] tid_fd_revalidate+0x56/0x12a SS:ESP 0068:f71d1eac May 29 12:02:23 unRAID kernel: CR2: 000000002e6b6c89 May 29 12:02:23 unRAID kernel: ---[ end trace 7f01c1191ac43037 ]--- May 29 12:06:47 unRAID kernel: BUG: unable to handle kernel NULL pointer dereference at 00000050 May 29 12:06:47 unRAID kernel: IP: [<c109ccc5>] d_path+0x16/0x100 May 29 12:06:47 unRAID kernel: *pdpt = 000000002c868001 *pde = 0000000000000000 May 29 12:06:47 unRAID kernel: Oops: 0000 [#2] SMP May 29 12:06:47 unRAID kernel: Modules linked in: md_mod sit2fe(O) m88ds3103(O) cx25840(O) sg cx23885(O) rc_core(O) videobuf_dma_sg(O) snd_pcm snd_timer snd_page_alloc cx2341x(O) v4l2_common(O) i2c_i801 ahci libahci coretemp hwmon videodev(O) tda18271(O) snd soundcore videobuf_dvb(O) e1000e dvb_core(O) videobuf_core(O) btcx_risc(O) ptp tveeprom(O) i2c_core pps_core [last unloaded: md_mod] May 29 12:06:47 unRAID kernel: Pid: 4135, comm: ps Tainted: G D O 3.9.6p-unRAID #1 Supermicro X9SCL/X9SCM/X9SCL/X9SCM May 29 12:06:47 unRAID kernel: EIP: 0060:[<c109ccc5>] EFLAGS: 00010292 CPU: 3 May 29 12:06:47 unRAID kernel: EIP is at d_path+0x16/0x100 May 29 12:06:47 unRAID kernel: EAX: 00000000 EBX: 0000007f ECX: 00001000 EDX: eef40000 May 29 12:06:47 unRAID kernel: ESI: ec82ff54 EDI: eee54de0 EBP: ec82ff48 ESP: ec82ff2c May 29 12:06:47 unRAID kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 May 29 12:06:47 unRAID kernel: CR0: 80050033 CR2: 00000050 CR3: 3094c000 CR4: 000407f0 May 29 12:06:47 unRAID kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 May 29 12:06:47 unRAID kernel: DR6: ffff0ff0 DR7: 00000400 May 29 12:06:47 unRAID kernel: Process ps (pid: 4135, ti=ec82e000 task=f0a91e60 task.ti=ec82e000) May 29 12:06:47 unRAID kernel: Stack: May 29 12:06:47 unRAID kernel: 00000000 00001000 eef41000 eee54de0 ec82ff48 0000007f eef40000 ec82ff68 May 29 12:06:47 unRAID kernel: c10c549e 40033d40 00000000 00000000 00004000 ec82ff80 c13ee380 ec82ff94 May 29 12:06:47 unRAID kernel: c1092396 ec82ff80 ec82ff7c eee54de0 00000000 f34ff0d0 eed56f80 bfbf9e4c May 29 12:06:47 unRAID kernel: Call Trace: May 29 12:06:47 unRAID kernel: [<c10c549e>] proc_pid_readlink+0x4d/0xa6 May 29 12:06:47 unRAID kernel: [<c1092396>] sys_readlinkat+0x76/0xab May 29 12:06:47 unRAID kernel: [<c10923f2>] sys_readlink+0x27/0x29 May 29 12:06:47 unRAID kernel: [<c13e5f40>] syscall_call+0x7/0xb May 29 12:06:47 unRAID kernel: Code: ff 19 c0 83 c0 02 eb 05 b8 02 00 00 00 83 c4 24 5b 5e 5f 5d c3 55 89 e5 56 89 c6 53 8d 04 0a 83 ec 14 89 45 ec 8b 46 04 89 4d e8 <8b> 58 50 85 db 74 0e 8b 5b 20 85 db 74 07 ff d3 e9 ce 00 00 00 May 29 12:06:47 unRAID kernel: EIP: [<c109ccc5>] d_path+0x16/0x100 SS:ESP 0068:ec82ff2c May 29 12:06:47 unRAID kernel: CR2: 0000000000000050 May 29 12:06:47 unRAID kernel: ---[ end trace 7f01c1191ac43038 ]--- In the terminal where the mover is running it looks like this: *** glibc detected *** rsync: free(): invalid next size (normal): 0x080d2498 *** ======= Backtrace: ========= /lib/libc.so.6(+0x705aa)[0x400a65aa] /lib/libc.so.6(+0x73503)[0x400a9503] /lib/libc.so.6(cfree+0x70)[0x400ac6b0] rsync[0x807cd74] rsync[0x807de60] rsync[0x804f3aa] rsync[0x8050b5f] rsync[0x8051e56] rsync[0x8065825] rsync[0x80666ac] /lib/libc.so.6(__libc_start_main+0xe6)[0x4004cb86] rsync[0x804aad1] ======= Memory map: ======== 08048000-0809d000 r-xp 00000000 00:01 2536 /usr/bin/rsync 0809d000-080a1000 rwxp 00054000 00:01 2536 /usr/bin/rsync 080a1000-080f2000 rwxp 00000000 00:00 0 [heap] 40000000-4001d000 r-xp 00000000 00:01 4298 /lib/ld-2.11.1.so 4001d000-4001e000 r-xp 0001d000 00:01 4298 /lib/ld-2.11.1.so 4001e000-4001f000 rwxp 0001e000 00:01 4298 /lib/ld-2.11.1.so This terminal is frozen, but this time no kernel panic. Can you help? Quote Link to comment
jus7incase Posted May 29, 2015 Author Share Posted May 29, 2015 Another instance of the problem after rewiring the drives. Mover: /usr/local/sbin/mover 2>&1 | tee /boot/logmover.txt mover started skipping app/ moving download/ ./download/sabnzbd/complete/couchpotato/.DS_Store >f.stpog... download/sabnzbd/complete/couchpotato/.DS_Store rsync: write failed on "/mnt/user0/download/sabnzbd/complete/couchpotato/.DS_Store": No space left on device (28) rsync error: error in file IO (code 11) at receiver.c(302) [receiver=3.0.7] rsync: connection unexpectedly closed (29 bytes received so far) [sender] rsync error: error in rsync protocol data stream (code 12) at io.c(601) [sender=3.0.7] moving private/ ./somepath/somebigfile *** glibc detected *** rsync: free(): invalid next size (normal): 0x080d2468 *** ======= Backtrace: ========= /lib/libc.so.6(+0x705aa)[0x400a65aa] /lib/libc.so.6(+0x73503)[0x400a9503] /lib/libc.so.6(cfree+0x70)[0x400ac6b0] rsync[0x807cd74] rsync[0x807de60] rsync[0x804f3aa] rsync[0x8050b5f] rsync[0x8051e56] rsync[0x8065825] rsync[0x80666ac] /lib/libc.so.6(__libc_start_main+0xe6)[0x4004cb86] rsync[0x804aad1] ======= Memory map: ======== 08048000-0809d000 r-xp 00000000 00:01 1518 /usr/bin/rsync 0809d000-080a1000 rwxp 00054000 00:01 1518 /usr/bin/rsync 080a1000-080f2000 rwxp 00000000 00:00 0 [heap] 40000000-4001d000 r-xp 00000000 00:01 3280 /lib/ld-2.11.1.so 4001d000-4001e000 r-xp 0001d000 00:01 3280 /lib/ld-2.11.1.so 4001e000-4001f000 rwxp 0001e000 00:01 3280 /lib/ld-2.11.1.so 4001f000-40020000 r-xp 00000000 00:00 0 [vdso] 40020000-40021000 rwxp 00000000 00:00 0 40028000-4002e000 r-xp 00000000 00:01 3596 /lib/libacl.so.1.1.0 4002e000-4002f000 rwxp 00005000 00:01 3596 /lib/libacl.so.1.1.0 4002f000-40035000 r-xp 00000000 00:01 3244 /lib/libpopt.so.0.0.0 40035000-40036000 rwxp 00006000 00:01 3244 /lib/libpopt.so.0.0.0 40036000-40192000 r-xp 00000000 00:01 3019 /lib/libc-2.11.1.so 40192000-40193000 ---p 0015c000 00:01 3019 /lib/libc-2.11.1.so 40193000-40195000 r-xp 0015c000 00:01 3019 /lib/libc-2.11.1.so 40195000-40196000 rwxp 0015e000 00:01 3019 /lib/libc-2.11.1.so 40196000-40199000 rwxp 00000000 00:00 0 40199000-4019d000 r-xp 00000000 00:01 3208 /lib/libattr.so.1.1.0 4019d000-4019e000 rwxp 00003000 00:01 3208 /lib/libattr.so.1.1.0 4019e000-401a0000 rwxp 00000000 00:00 0 401a0000-401f6000 r-xp 00000000 00:01 2221 /usr/lib/locale/locale-archive 401f6000-40235000 r-xp 00000000 00:01 2233 /usr/lib/locale/en_US.utf8/LC_CTYPE 40235000-40297000 rwxp 00000000 00:00 0 40297000-402b3000 r-xp 00000000 00:01 2158 /usr/lib/libgcc_s.so.1 402b3000-402b4000 rwxp 0001b000 00:01 2158 /usr/lib/libgcc_s.so.1 40300000-40321000 rwxp 00000000 00:00 0 40321000-40400000 ---p 00000000 00:00 0 bfeda000-bfefb000 rw-p 00000000 00:00 0 [stack] find: `rsync' terminated by signal 6 rsync: writefd_unbuffered failed to write 79 bytes to socket [Receiver]: Broken pipe (32) rsync error: error in rsync protocol data stream (code 12) at io.c(1530) [Receiver=3.0.7] ./somepath/someotherbigfile *** glibc detected *** rsync: free(): invalid next size (normal): 0x080d2450 *** ======= Backtrace: ========= /lib/libc.so.6(+0x705aa)[0x400a65aa] /lib/libc.so.6(+0x73503)[0x400a9503] /lib/libc.so.6(cfree+0x70)[0x400ac6b0] rsync[0x807cd74] rsync[0x807de60] rsync[0x804f3aa] rsync[0x8050b5f] rsync[0x8051e56] rsync[0x8065825] rsync[0x80666ac] /lib/libc.so.6(__libc_start_main+0xe6)[0x4004cb86] rsync[0x804aad1] ======= Memory map: ======== 08048000-0809d000 r-xp 00000000 00:01 1518 /usr/bin/rsync 0809d000-080a1000 rwxp 00054000 00:01 1518 /usr/bin/rsync 080a1000-080f2000 rwxp 00000000 00:00 0 [heap] 40000000-4001d000 r-xp 00000000 00:01 3280 /lib/ld-2.11.1.so 4001d000-4001e000 r-xp 0001d000 00:01 3280 /lib/ld-2.11.1.so 4001e000-4001f000 rwxp 0001e000 00:01 3280 /lib/ld-2.11.1.so 4001f000-40020000 r-xp 00000000 00:00 0 [vdso] 40020000-40021000 rwxp 00000000 00:00 0 40028000-4002e000 r-xp 00000000 00:01 3596 /lib/libacl.so.1.1.0 4002e000-4002f000 rwxp 00005000 00:01 3596 /lib/libacl.so.1.1.0 4002f000-40035000 r-xp 00000000 00:01 3244 /lib/libpopt.so.0.0.0 40035000-40036000 rwxp 00006000 00:01 3244 /lib/libpopt.so.0.0.0 40036000-40192000 r-xp 00000000 00:01 3019 /lib/libc-2.11.1.so 40192000-40193000 ---p 0015c000 00:01 3019 /lib/libc-2.11.1.so 40193000-40195000 r-xp 0015c000 00:01 3019 /lib/libc-2.11.1.so 40195000-40196000 rwxp 0015e000 00:01 3019 /lib/libc-2.11.1.so 40196000-40199000 rwxp 00000000 00:00 0 40199000-4019d000 r-xp 00000000 00:01 3208 /lib/libattr.so.1.1.0 4019d000-4019e000 rwxp 00003000 00:01 3208 /lib/libattr.so.1.1.0 4019e000-401a0000 rwxp 00000000 00:00 0 401a0000-401f6000 r-xp 00000000 00:01 2221 /usr/lib/locale/locale-archive 401f6000-40235000 r-xp 00000000 00:01 2233 /usr/lib/locale/en_US.utf8/LC_CTYPE 40235000-40297000 rwxp 00000000 00:00 0 40297000-402b3000 r-xp 00000000 00:01 2158 /usr/lib/libgcc_s.so.1 402b3000-402b4000 rwxp 0001b000 00:01 2158 /usr/lib/libgcc_s.so.1 40300000-40321000 rwxp 00000000 00:00 0 40321000-40400000 ---p 00000000 00:00 0 bf83e000-bf85f000 rw-p 00000000 00:00 0 [stack] find: `rsync' terminated by signal 6 rsync: writefd_unbuffered failed to write 79 bytes to socket [Receiver]: Broken pipe (32) rsync error: error in rsync protocol data stream (code 12) at io.c(1530) [Receiver=3.0.7] find: `fuser' terminated by signal 9 Message from syslogd@unRAID at Fri May 29 13:15:28 2015 ... unRAID kernel: Process fuser (pid: 2342, ti=eec72000 task=f725e880 task.ti=eec72000) Message from syslogd@unRAID at Fri May 29 13:15:28 2015 ... unRAID kernel: Stack: Message from syslogd@unRAID at Fri May 29 13:15:28 2015 ... unRAID kernel: Call Trace: Message from syslogd@unRAID at Fri May 29 13:15:28 2015 ... unRAID kernel: Code: 00 89 55 f0 e8 78 7c fd ff 8b 55 f0 85 c0 0f 84 98 00 00 00 8b 48 04 3b 11 0f 83 88 00 00 00 8b 49 04 8d 14 91 8b 12 85 d2 74 7c <8b> 7a 28 e8 76 7c fd ff 8d 83 d0 02 00 00 e8 84 c2 31 00 8b 83 Message from syslogd@unRAID at Fri May 29 13:15:28 2015 ... unRAID kernel: EIP: [<c10c9931>] tid_fd_revalidate+0x56/0x12a SS:ESP 0068:eec73eac Syslog: May 29 13:15:26 unRAID shfs/user0: shfs_write: write: (28) No space left on device May 29 13:15:28 unRAID kernel: BUG: unable to handle kernel paging request at 2e6b6c89 May 29 13:15:28 unRAID kernel: IP: [<c10c9931>] tid_fd_revalidate+0x56/0x12a May 29 13:15:28 unRAID kernel: *pdpt = 0000000030d7e001 *pde = 0000000000000000 May 29 13:15:28 unRAID kernel: Oops: 0000 [#1] SMP Message from syslogd@unRAID at Fri May 29 13:15:28 2015 ... unRAID kernel: Process fuser (pid: 2342, ti=eec72000 task=f725e880 task.ti=eec72000) Message from syslogd@unRAID at Fri May 29 13:15:28 2015 ... unRAID kernel: Stack: Message from syslogd@unRAID at Fri May 29 13:15:28 2015 ... unRAID kernel: Call Trace: May 29 13:15:28 unRAID kernel: Modules linked in: ntfs md_mod sit2fe(O) m88ds3103(O) cx25840(O) sg cx23885(O) rc_core(O) videobuf_dma_sg(O) snd_pcm snd_timer snd_page_alloc cx2341x(O) v4l2_common(O) i2c_i801 coretemp hwmon e1000e ptp pps_core videodev(O) tda18271(O) snd soundcore videobuf_dvb(O) ahci libahci dvb_core(O) videobuf_core(O) btcx_risc(O) tveeprom(O) i2c_core [last unloaded: md_mod] May 29 13:15:28 unRAID kernel: Pid: 2342, comm: fuser Tainted: G O 3.9.6p-unRAID #1 Supermicro X9SCL/X9SCM/X9SCL/X9SCM May 29 13:15:28 unRAID kernel: EIP: 0060:[<c10c9931>] EFLAGS: 00010202 CPU: 2 May 29 13:15:28 unRAID kernel: EIP is at tid_fd_revalidate+0x56/0x12a May 29 13:15:28 unRAID kernel: EAX: f3681200 EBX: f3d02be0 ECX: f0ebdc00 EDX: 2e6b6c61 May 29 13:15:28 unRAID kernel: ESI: f6750440 EDI: ee961000 EBP: eec73ebc ESP: eec73eac May 29 13:15:28 unRAID kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 May 29 13:15:28 unRAID kernel: CR0: 80050033 CR2: 2e6b6c89 CR3: 2edcd000 CR4: 000407f0 May 29 13:15:28 unRAID kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 May 29 13:15:28 unRAID kernel: DR6: ffff0ff0 DR7: 00000400 May 29 13:15:28 unRAID kernel: Process fuser (pid: 2342, ti=eec72000 task=f725e880 task.ti=eec72000) May 29 13:15:28 unRAID kernel: Stack: May 29 13:15:28 unRAID kernel: 0000002a f6750440 ee961000 f0f93240 eec73ecc c10c9ad1 ee87d300 ee961000 May 29 13:15:28 unRAID kernel: eec73f0c c10c6de0 0000002a 0000000d c14bcfba eec73f3b eec73f48 eec73f3d May 29 13:15:28 unRAID kernel: c109ac64 eec73f90 00003234 00000002 eec73f3b f0f93240 0000002a f3681200 May 29 13:15:28 unRAID kernel: Call Trace: May 29 13:15:28 unRAID kernel: [<c10c9ad1>] proc_fd_instantiate+0x6a/0x74 May 29 13:15:28 unRAID kernel: [<c10c6de0>] proc_fill_cache+0x66/0xf9 May 29 13:15:28 unRAID kernel: [<c109ac64>] ? sys_ioctl+0x50/0x50 May 29 13:15:28 unRAID kernel: [<c10c9756>] proc_readfd_common+0x15d/0x1a4 May 29 13:15:28 unRAID kernel: [<c10c9a67>] ? proc_fdinfo_instantiate+0x62/0x62 May 29 13:15:28 unRAID kernel: [<c109ac64>] ? sys_ioctl+0x50/0x50 May 29 13:15:28 unRAID kernel: [<c10c97c3>] proc_readfd+0x12/0x14 May 29 13:15:28 unRAID kernel: [<c10c9a67>] ? proc_fdinfo_instantiate+0x62/0x62 May 29 13:15:28 unRAID kernel: [<c109af56>] vfs_readdir+0x52/0x7a May 29 13:15:28 unRAID kernel: [<c109ac64>] ? sys_ioctl+0x50/0x50 May 29 13:15:28 unRAID kernel: [<c109b0e8>] sys_getdents64+0x62/0xba May 29 13:15:28 unRAID kernel: [<c13e5f40>] syscall_call+0x7/0xb May 29 13:15:28 unRAID kernel: Code: 00 89 55 f0 e8 78 7c fd ff 8b 55 f0 85 c0 0f 84 98 00 00 00 8b 48 04 3b 11 0f 83 88 00 00 00 8b 49 04 8d 14 91 8b 12 85 d2 74 7c <8b> 7a 28 e8 76 7c fd ff 8d 83 d0 02 00 00 e8 84 c2 31 00 8b 83 May 29 13:15:28 unRAID kernel: EIP: [<c10c9931>] tid_fd_revalidate+0x56/0x12a SS:ESP 0068:eec73eac May 29 13:15:28 unRAID kernel: CR2: 000000002e6b6c89 May 29 13:15:28 unRAID kernel: ---[ end trace a6212521814ae9ae ]--- Message from syslogd@unRAID at Fri May 29 13:15:28 2015 ... unRAID kernel: Code: 00 89 55 f0 e8 78 7c fd ff 8b 55 f0 85 c0 0f 84 98 00 00 00 8b 48 04 3b 11 0f 83 88 00 00 00 8b 49 04 8d 14 91 8b 12 85 d2 74 7c <8b> 7a 28 e8 76 7c fd ff 8d 83 d0 02 00 00 e8 84 c2 31 00 8b 83 Message from syslogd@unRAID at Fri May 29 13:15:28 2015 ... unRAID kernel: EIP: [<c10c9931>] tid_fd_revalidate+0x56/0x12a SS:ESP 0068:eec73eac Quote Link to comment
WeeboTech Posted May 29, 2015 Share Posted May 29, 2015 I keep seeing No space left on device (28) I'm not so sure it's the kernel, however the glibc free failure could lead to other issues which end up getting back to the kernel. Maybe the split level needs to be adjusted or there is a problem with SHFS and it's handling of out of space. (Which could be a bug in the user share or fuse code). Has a memtest been run to insure that memory is good? Quote Link to comment
jus7incase Posted May 29, 2015 Author Share Posted May 29, 2015 I keep seeing No space left on device (28) I'm not so sure it's the kernel, however the glibc free failure could lead to other issues which end up getting back to the kernel. Maybe the split level needs to be adjusted or there is a problem with SHFS and it's handling of out of space. (Which could be a bug in the user share or fuse code). Has a memtest been run to insure that memory is good? This kind of debugging is out of my reach. I have not run a mem test, let me know how. FYI this is a SuperMicro board using ECC ram. Would faulty ECC mem cause then this or be detected at a different layer? Quote Link to comment
BRiT Posted May 29, 2015 Share Posted May 29, 2015 Also, after you do a 12 hour run of memtest try to run filesystem checks. Quote Link to comment
jus7incase Posted May 29, 2015 Author Share Posted May 29, 2015 Also, after you do a 12 hour run of memtest try to run filesystem checks. Happy to do so according to pointers on how to Thanks! Quote Link to comment
SSD Posted May 30, 2015 Share Posted May 30, 2015 When you boot from the USB stick, one of the options is a memory test. Quote Link to comment
jus7incase Posted May 30, 2015 Author Share Posted May 30, 2015 Got it. It is running. Will come back with the results in the evening. Since the problem is 100% repeatable currently, I suppose a few hours mem test would be enough? 6h? Quote Link to comment
SSD Posted May 30, 2015 Share Posted May 30, 2015 Got it. It is running. Will come back with the results in the evening. Since the problem is 100% repeatable currently, I suppose a few hours mem test would be enough? 6h? 24h is recommended. Quote Link to comment
jus7incase Posted May 30, 2015 Author Share Posted May 30, 2015 After 4h - 7 Passes - 0 Errors. Will do 4 days from Wed to Sun. Quote Link to comment
jus7incase Posted May 31, 2015 Author Share Posted May 31, 2015 Folks, I found the option that causes the problem: rsync -X == If I omit the X from the option, the kernel panic does not occur. I have to see in a longer run, if that still holds. (FYI I removed the rsync options and them put them back one by one until the kernel panic occurs, so I am pretty sure it is the -X option) Could it be that something that -X does has problems with very large disks? Quote Link to comment
c3 Posted May 31, 2015 Share Posted May 31, 2015 Folks, I found the option that causes the problem: rsync -X == If I omit the X from the option, the kernel panic does not occur. I have to see in a longer run, if that still holds. (FYI I removed the rsync options and them put them back one by one until the kernel panic occurs, so I am pretty sure it is the -X option) Could it be that something that -X does has problems with very large disks? More likely to do with different filesystems. Quote Link to comment
SSD Posted May 31, 2015 Share Posted May 31, 2015 Folks, I found the option that causes the problem: rsync -X == If I omit the X from the option, the kernel panic does not occur. I have to see in a longer run, if that still holds. (FYI I removed the rsync options and them put them back one by one until the kernel panic occurs, so I am pretty sure it is the -X option) Could it be that something that -X does has problems with very large disks? -X, --xattrs preserve extended attributes Doesn't exactly scream "kernel panic". But may cause additional small I/Os to read the extended attributes on each file, and them write them on the target file. This may be exposing the issue, but I still expect you have some system issue that is the root cause. More likely to do with different filesystems. Interesting thought. Maybe rsync having issues copying extended attributes to different file systems. jus7incase - Are the disks on different file systems? Quote Link to comment
jus7incase Posted June 1, 2015 Author Share Posted June 1, 2015 Rsync copies from /mnt/cache to /mnt/user0. Should both use ReiserFS. Standard for Unraid. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.