December 25, 200817 yr I just upgraded a 500gb disk to 1tb, during the rebuild (it's still in progress) there was an emask error. Dec 25 04:47:00 Tower3 kernel: ata3.00: exception Emask 0x10 SAct 0x1 SErr 0x780100 action 0x2 Dec 25 04:47:00 Tower3 kernel: ata3.00: irq_stat 0x08000000 Dec 25 04:47:00 Tower3 kernel: ata3: SError: { UnrecovData 10B8B Dispar BadCRC Handshk } Dec 25 04:47:00 Tower3 kernel: ata3.00: cmd 60/00:00:27:b9:f0/04:00:19:00:00/40 tag 0 ncq 524288 in Dec 25 04:47:00 Tower3 kernel: res 40/00:04:27:b9:f0/00:00:19:00:00/40 Emask 0x10 (ATA bus error) Dec 25 04:47:00 Tower3 kernel: ata3.00: status: { DRDY } Full syslog below, can anyone help idendify wich disk/controller is the problem? I assume that with this error the rebuilt drive may have corrupted data, should i install the old 500gb drive and run a new parity check? Thnaks for any help.
December 25, 200817 yr This looks like just a momentary communications glitch on the bus, while under heavy traffic. I don't believe you have any problems at all. unRAID was not aware of it, as it is detected and dealt with at a lower level. unRAID requested data and got it, although there was a brief delay this one time, while the link was reset. At the lower level, an ATA bus error with evidence of invalid parameters, occurred, so the ATA subsystem paused and reset the link, and after about 2 seconds, the link came back up at full speed and without any further issues, and the appropriate data was returned to the unRAID system. I've been doing some searching on similar errors, and seeing comments from the experts that the SATA 'system' is somewhat fragile, especially the early SATA chipsets, and subject to a number of glitches. The exception handler has had extra effort put into it to properly take care of these issues, and restore the channel to normal operation. The whole exception Emask reporting mechanism certainly looks scary, too cryptic for almost everyone, but this is a case where it is trying to report as much technical information as possible, and I would guess that in a year or 2, when this whole SATA world stabilizes more, that the error reporting will also be greatly 'simplified', with only very brief messages, as in most other subsystems. I really don't like these hard resets, but the experts seem to be working very hard on this, to improve the safety and stability of the whole operation. Apart from seeing this mentioned in the syslog, I don't think you will see any errors. If you are in anyway unsure, you might try mounting the old drive, and comparing the files. I'm not familiar with file compare tools under Linux though, but others here can help.
December 27, 200817 yr Author Thanks for the reply, to be safe i installed the 500gb disk again, did a parity check and then tryed to upgrad the disk again, it was rebuilding during the night, in the morning unraid had crashed, i can't access the server with the web browser or telnet, i was tailing the syslog and this is the all i can see: Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772792/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772800/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772808/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772816/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772824/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772832/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772840/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772848/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772856/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772864/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772872/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772880/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772888/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772896/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772904/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772912/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772920/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772928/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772936/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772944/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772952/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772960/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772968/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772976/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772984/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976772992/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976773000/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976773008/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976773016/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976773024/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976773032/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976773040/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976773048/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976773056/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976773064/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976773072/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976773080/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976773088/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md8: read error! Dec 26 20:47:01 Tower3 kernel: handle_stripe read error: 976773096/8, count: 1 Dec 26 20:47:01 Tower3 kernel: md: sync done. time=21001sec rate=46510K/sec Dec 26 20:47:01 Tower3 kernel: crond invoked oom-killer: gfp_mask=0x1280d2, order=0, oomkilladj=0 Dec 26 20:47:01 Tower3 kernel: Pid: 1939, comm: crond Not tainted 2.6.24.4-unRAID #10 Dec 26 20:47:01 Tower3 kernel: [<c0134154>] oom_kill_process+0x53/0xf7 Dec 26 20:47:01 Tower3 kernel: [<c013447c>] out_of_memory+0x141/0x16d Dec 26 20:47:01 Tower3 kernel: [<c0135e83>] __alloc_pages+0x238/0x2c6 Dec 26 20:47:01 Tower3 kernel: [<c013bdaf>] do_wp_page+0x354/0x3eb Dec 26 20:47:01 Tower3 kernel: [<c013cc2f>] handle_mm_fault+0x168/0x50f Dec 26 20:47:01 Tower3 kernel: [<c0111eff>] do_page_fault+0x18c/0x52d Dec 26 20:47:01 Tower3 kernel: [<c0154f3c>] do_fcntl+0xeb/0x259 Dec 26 20:47:01 Tower3 kernel: [<c0111d73>] do_page_fault+0x0/0x52d Dec 26 20:47:01 Tower3 kernel: [<c02f21ea>] error_code+0x6a/0x70 Dec 26 20:47:01 Tower3 kernel: [<c02f0000>] rpc_rmdir+0x51/0x5d Dec 26 20:47:01 Tower3 kernel: ======================= Dec 26 20:47:01 Tower3 kernel: Mem-info: Dec 26 20:47:01 Tower3 kernel: DMA per-cpu: Dec 26 20:47:01 Tower3 kernel: CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0 Dec 26 20:47:01 Tower3 kernel: Normal per-cpu: Dec 26 20:47:01 Tower3 kernel: CPU 0: Hot: hi: 186, btch: 31 usd: 139 Cold: hi: 62, btch: 15 usd: 53 Dec 26 20:47:01 Tower3 kernel: HighMem per-cpu: Dec 26 20:47:01 Tower3 kernel: CPU 0: Hot: hi: 42, btch: 7 usd: 34 Cold: hi: 14, btch: 3 usd: 12 Dec 26 20:47:01 Tower3 kernel: Active:130589 inactive:113812 dirty:0 writeback:3 unstable:0 Dec 26 20:47:01 Tower3 kernel: free:2913 slab:1939 mapped:120 pagetables:73 bounce:0 Dec 26 20:47:01 Tower3 kernel: DMA free:4064kB min:68kB low:84kB high:100kB active:4552kB inactive:4256kB present:16256kB pages_scanned:312948 all_unreclaimable? yes Dec 26 20:47:01 Tower3 kernel: lowmem_reserve[]: 0 873 999 999 Dec 26 20:47:01 Tower3 kernel: Normal free:7476kB min:3744kB low:4680kB high:5616kB active:452276kB inactive:395124kB present:894080kB pages_scanned:583642958 all_unreclaimable? yes Dec 26 20:47:01 Tower3 kernel: lowmem_reserve[]: 0 0 1012 1012 Dec 26 20:47:01 Tower3 kernel: HighMem free:112kB min:128kB low:260kB high:396kB active:65528kB inactive:55868kB present:129604kB pages_scanned:1388007 all_unreclaimable? yes Dec 26 20:47:01 Tower3 kernel: lowmem_reserve[]: 0 0 0 0 Dec 26 20:47:01 Tower3 kernel: DMA: 0*4kB 0*8kB 0*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4064kB Dec 26 20:47:01 Tower3 kernel: Normal: 1*4kB 8*8kB 1*16kB 1*32kB 5*64kB 9*128kB 5*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 7476kB Dec 26 20:47:01 Tower3 kernel: HighMem: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 112kB Dec 26 20:47:01 Tower3 kernel: Swap cache: add 0, delete 0, find 0/0, race 0+0 Dec 26 20:47:01 Tower3 kernel: Free swap = 0kB Dec 26 20:47:01 Tower3 kernel: Total swap = 0kB Dec 26 20:47:01 Tower3 kernel: Free swap: 0kB Dec 26 20:47:01 Tower3 kernel: 262032 pages of RAM Dec 26 20:47:01 Tower3 kernel: 32656 pages of HIGHMEM Dec 26 20:47:01 Tower3 kernel: 3159 reserved pages Dec 26 20:47:01 Tower3 kernel: 211 pages shared Dec 26 20:47:01 Tower3 kernel: 0 pages swap cached Dec 26 20:47:01 Tower3 kernel: 0 pages dirty Dec 26 20:47:01 Tower3 kernel: 3 pages writeback Dec 26 20:47:01 Tower3 kernel: 120 pages mapped Dec 26 20:47:01 Tower3 kernel: 1939 pages slab Dec 26 20:47:01 Tower3 kernel: 73 pages pagetables Dec 26 20:47:01 Tower3 kernel: Out of memory: kill process 1941 (atd) score 58 or a child Dec 26 20:47:01 Tower3 kernel: Killed process 1941 (atd) Dec 26 20:47:01 Tower3 kernel: syslogd invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0 Dec 26 20:47:01 Tower3 kernel: Pid: 1390, comm: syslogd Not tainted 2.6.24.4-unRAID #10 Dec 26 20:47:01 Tower3 kernel: [<c0134154>] oom_kill_process+0x53/0xf7 Dec 26 20:47:01 Tower3 kernel: [<c013447c>] out_of_memory+0x141/0x16d Dec 26 20:47:01 Tower3 kernel: [<c0135e83>] __alloc_pages+0x238/0x2c6 Dec 26 20:47:01 Tower3 kernel: [<c011370e>] enqueue_entity+0x2b/0x3d Dec 26 20:47:01 Tower3 kernel: [<c0131cd3>] __grab_cache_page+0x59/0x8b Dec 26 20:47:01 Tower3 kernel: [<c0160f36>] simple_write_begin+0x27/0x56 Dec 26 20:47:01 Tower3 kernel: [<c0132950>] generic_file_buffered_write+0x104/0x596 Dec 26 20:47:01 Tower3 kernel: [<c0133216>] __generic_file_aio_write_nolock+0x434/0x490 Dec 26 20:47:01 Tower3 kernel: [<f88520a8>] ata_altstatus+0x1c/0x20 [libata] Dec 26 20:47:01 Tower3 kernel: [<f884e33b>] ata_qc_issue_prot+0xe8/0x21e [libata] Dec 26 20:47:01 Tower3 kernel: [<c01561eb>] core_sys_select+0x18e/0x2a0 Dec 26 20:47:01 Tower3 kernel: [<c01332c4>] generic_file_aio_write+0x52/0xb0 Dec 26 20:47:01 Tower3 kernel: [<c014c12b>] do_sync_readv_writev+0xc0/0xfd Dec 26 20:47:01 Tower3 kernel: [<c0123ded>] autoremove_wake_function+0x0/0x35 Dec 26 20:47:01 Tower3 kernel: [<c013ba1c>] __do_fault+0x2ac/0x2eb Dec 26 20:47:01 Tower3 kernel: [<c014bfe5>] rw_copy_check_uvector+0x50/0xaa Dec 26 20:47:01 Tower3 kernel: [<c014c7c4>] do_readv_writev+0x99/0x164 Dec 26 20:47:01 Tower3 kernel: [<c0133272>] generic_file_aio_write+0x0/0xb0 Dec 26 20:47:01 Tower3 kernel: [<c014c8cc>] vfs_writev+0x3d/0x48 Dec 26 20:47:01 Tower3 kernel: [<c014cc11>] sys_writev+0x41/0x67 Dec 26 20:47:01 Tower3 kernel: [<c0103b4e>] syscall_call+0x7/0xb Dec 26 20:47:01 Tower3 kernel: ======================= Dec 26 20:47:01 Tower3 kernel: Mem-info: Dec 26 20:47:01 Tower3 kernel: DMA per-cpu: Dec 26 20:47:01 Tower3 kernel: CPU 0: Hot: hi: 0, btch: 1 usd: 0 Cold: hi: 0, btch: 1 usd: 0 Dec 26 20:47:01 Tower3 kernel: Normal per-cpu: Dec 26 20:47:01 Tower3 kernel: CPU 0: Hot: hi: 186, btch: 31 usd: 140 Cold: hi: 62, btch: 15 usd: 53 Dec 26 20:47:01 Tower3 kernel: HighMem per-cpu: Dec 26 20:47:01 Tower3 kernel: CPU 0: Hot: hi: 42, btch: 7 usd: 37 Cold: hi: 14, btch: 3 usd: 12 Dec 26 20:47:01 Tower3 kernel: Active:123356 inactive:121077 dirty:0 writeback:0 unstable:0 Dec 26 20:47:01 Tower3 kernel: free:2915 slab:1938 mapped:152 pagetables:70 bounce:0 Dec 26 20:47:01 Tower3 kernel: DMA free:4064kB min:68kB low:84kB high:100kB active:4552kB inactive:4256kB present:16256kB pages_scanned:315060 all_unreclaimable? yes Dec 26 20:47:01 Tower3 kernel: lowmem_reserve[]: 0 873 999 999 Dec 26 20:47:01 Tower3 kernel: Normal free:7484kB min:3744kB low:4680kB high:5616kB active:424316kB inactive:423212kB present:894080kB pages_scanned:448341941 all_unreclaimable? yes Dec 26 20:47:01 Tower3 kernel: lowmem_reserve[]: 0 0 1012 1012 Dec 26 20:47:01 Tower3 kernel: HighMem free:112kB min:128kB low:260kB high:396kB active:64556kB inactive:56840kB present:129604kB pages_scanned:199339 all_unreclaimable? yes Dec 26 20:47:01 Tower3 kernel: lowmem_reserve[]: 0 0 0 0 Dec 26 20:47:01 Tower3 kernel: DMA: 0*4kB 0*8kB 0*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4064kB Dec 26 20:47:01 Tower3 kernel: Normal: 1*4kB 9*8kB 1*16kB 1*32kB 5*64kB 9*128kB 5*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 7484kB Dec 26 20:47:01 Tower3 kernel: HighMem: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 112kB Dec 26 20:47:01 Tower3 kernel: Swap cache: add 0, delete 0, find 0/0, race 0+0 Dec 26 20:47:01 Tower3 kernel: Free swap = 0kB Dec 26 20:47:01 Tower3 kernel: Total swap = 0kB Dec 26 20:47:01 Tower3 kernel: Free swap: 0kB Dec 26 20:47:01 Tower3 kernel: 262032 pages of RAM Dec 26 20:47:01 Tower3 kernel: 32656 pages of HIGHMEM Dec 26 20:47:01 Tower3 kernel: 3159 reserved pages Dec 26 20:47:01 Tower3 kernel: 292 pages shared Dec 26 20:47:01 Tower3 kernel: 0 pages swap cached Dec 26 20:47:01 Tower3 kernel: 0 pages dirty Dec 26 20:47:01 Tower3 kernel: 0 pages writeback Dec 26 20:47:01 Tower3 kernel: 152 pages mapped Dec 26 20:47:01 Tower3 kernel: 1938 pages slab Dec 26 20:47:01 Tower3 kernel: 70 pages pagetables Dec 26 20:47:01 Tower3 kernel: Out of memory: kill process 2367 (nmbd) score 43 or a child Dec 26 20:47:01 Tower3 kernel: Killed process 2367 (nmbd) Dec 26 20:47:02 Tower3 kernel: md: recovery thread sync completion status: 0
December 27, 200817 yr Basically, communications with "disk8" affiliated with /dev/md8 failed. Every time it was accessed (read) an entry was written to the syslog when the "read" failed. The syslog file is created in a file-system that exists only in RAM. Unfortunately, eventually, it used up all the RAM on your server and Linux attempted to react by freeing up some memory in the only way it could, by killing off processes that were not in use. One of those was the unraid management interface. That is why it looks like unraid has crashed. If you can telnet in try these commands: rm /var/log/syslog nohup /usr/local/sbin/emhttp & The first should delete the syslog to free up some space, the second should re-start the management interface so you can shut the server down. Only start the "emhttp" process in this way if the original one is not running. Type ps -ef If you see emhttp in the process list, do not start another one. Joe L.
December 28, 200817 yr Author Looking at the 1st syslog I posted I think the emask errors on ata3 were also caused by disk8, so I replaced that disk with the new 1tb isntead of the one I was doing earlier and so far all seems good. Thanks for the help.
December 28, 200817 yr Looking at the 1st syslog I posted I think the emask errors on ata3 were also caused by disk8, so I replaced that disk with the new 1tb instead of the one I was doing earlier and so far all seems good. Thanks for the help. I'm assuming you put the first 500G disk back in. (the one you were originally trying to replace with the larger 1TB disk) Glad you got things going again. Joe L.
Archived
This topic is now archived and is closed to further replies.