March 11, 201016 yr I was getting an interrupted file transfer that I couldn't explain and took a look at the syslog. What I saw follows... anyone willing and able to shed some light on these errors, please? Mar 11 12:17:15 Dingo kernel: ata2: lost interrupt (Status 0x50) Mar 11 12:17:15 Dingo kernel: ata2.01: exception Emask 0x50 SAct 0x0 SErr 0x40d0802 action 0x0 frozen Mar 11 12:17:15 Dingo kernel: ata2.01: SError: { RecovComm HostInt PHYRdyChg CommWake 10B8B DevExch } Mar 11 12:17:15 Dingo kernel: ata2.01: failed command: WRITE DMA EXT Mar 11 12:17:15 Dingo kernel: ata2.01: cmd 35/00:00:77:ef:77/00:04:00:00:00/f0 tag 0 dma 524288 out Mar 11 12:17:15 Dingo kernel: res 40/00:ff:00:00:00/00:00:00:00:00/10 Emask 0x54 (ATA bus error) Mar 11 12:17:15 Dingo kernel: ata2.01: status: { DRDY } Mar 11 12:17:15 Dingo kernel: ata2.00: hard resetting link Mar 11 12:17:16 Dingo kernel: ata2.01: hard resetting link Mar 11 12:17:17 Dingo kernel: ata2.00: SATA link up 1.5 Gbps (SStatus 113 SControl 300) Mar 11 12:17:17 Dingo kernel: ata2.01: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Mar 11 12:17:17 Dingo kernel: ata2.00: configured for UDMA/133 Mar 11 12:17:17 Dingo kernel: ata2.01: configured for UDMA/33 Mar 11 12:17:17 Dingo kernel: ata2.01: device reported invalid CHS sector 0 Mar 11 12:17:17 Dingo kernel: ata2: EH complete syslog-2010-03-11.txt
March 11, 201016 yr You may have a bad motherboard, but there are some things to try first. It looks like you have a C2SEA board, with the onboard SATA ports set to an emulated mode in the BIOS, so change that to use AHCI instead, as that will change the 'drivers' for the 6 onboard ports. While the specific problem seems to be related to the interrupt handling the second emulated channel (handling the third and fourth SATA port), the first ports also had problems at the start. After that was worked out, they seem to have had no further issues, but that is still suspicious. I believe IRQ 15 was assigned to the problem channel (IRQ 14 was assigned to the first channel), and while in theory that sounds great, because it means you are reusing a base hardware IRQ, it still seems unusual to me, as I don't recall hardly ever seeing that done before. You might re-enable the IDE channels in the BIOS, in order to force those to be re-reserved for IDE use, and therefore unavailable for your SATA ports, just in case there is a hardware issue with those IRQ's. Another thing to try is one of the Boot Codes. Experiment with those related to interrupt handling or PCI. I would also check to see whether you have the latest BIOS for your board. Check also for overheating of motherboard components, especially the northbridge chipset. A remote possibility is bad power, but the errors and error flags seem too consistent for that. I would expect more random behavior with power glitches. How do you determine if you have a bad motherboard? Often, sadly, by eliminating everything else. Once you have checked off all other possibilities, and then tried another board and found it worked great, you can finally conclude the first board was bad. Hopefully, one of the options above, or an idea from someone else, will get you running smoothly again. While you are currently still able to use your drives, one of them had been slowed to almost the lowest speed possible.
March 11, 201016 yr Author While trying to find the BIOS setting for AHCI (which I can't), I hooked up a monitor and noticed that after unRAID boots and I get a cursor for login, there's a message after the login prompt, displaying twice. Last time I booted it with monitor, this message wasn't there. Movies is a share, in case that wasn't obvious. exportfs: Movies has non-inet addr I appreciate the suggestions... I plan on chasing things down as I can, in between keeping a lid on the kids.
March 12, 201016 yr Author Following this thread, I changed the drives to Not Installed and didn't see the errors during the boot. I'm still tracking down some of the other suggestions, but that seems to be a large help.
March 12, 201016 yr Author One of the drives is now returning this: REISERFS error (device md2): vs-5150 search_by_key: invalid format found in block 48050635. Fsck?
March 12, 201016 yr That disk needs to have a reiserfsck run against it. The file-system has some corruption.
March 12, 201016 yr While ..., I hooked up a monitor and noticed that after unRAID boots and I get a cursor for login, there's a message after the login prompt, displaying twice. Last time I booted it with monitor, this message wasn't there. Movies is a share, in case that wasn't obvious. exportfs: Movies has non-inet addr That message was also in your syslog (portion below), but seemed much less important, so I did not mention it. Mar 5 23:37:42 Dingo emhttp: shcmd (23): /etc/rc.d/rc.nfsd restart | logger Mar 5 23:37:44 Dingo logger: Starting NFS server daemons: Mar 5 23:37:44 Dingo logger: /usr/sbin/exportfs -r Mar 5 23:37:45 Dingo nss_wins[1605]: Movies has non-inet addr Mar 5 23:37:46 Dingo nss_wins[1605]: Movies has non-inet addr Mar 5 23:37:46 Dingo logger: /usr/sbin/rpc.nfsd 8 Mar 5 23:37:46 Dingo logger: /usr/sbin/rpc.mountd I'm not an expert here, but it looks to me as if it is related to the NFS startup, perhaps a misconfiguration or something strange in the full path name. While trying to find the BIOS setting for AHCI (which I can't) ... Following this thread, I changed the drives to Not Installed and didn't see the errors during the boot. I'm still tracking down some of the other suggestions, but that seems to be a large help. Again, I'm not an expert here, haven't seen that BIOS (seems a little non-standard), but I believe you will want the SATA drives set to 'Enhanced', not 'Compatible'. That should enable AHCI support, or at least a native SATA mode. 'Compatible' mode usually means: make the drives look like IDE drives, which results in using the ata_piix module instead of ahci.
March 12, 201016 yr I think I may be having a problem similar to yours. I'm leaning towards the sata card that is installed in mine. Mar 12 15:51:35 Tower emhttp: Spinning up all drives... Mar 12 15:51:35 Tower emhttp: shcmd (55): /usr/sbin/hdparm -S0 /dev/sdj >/dev/null Mar 12 15:51:35 Tower kernel: mdcmd (63024): spinup 0 Mar 12 15:51:35 Tower kernel: mdcmd (63025): spinup 1 Mar 12 15:51:35 Tower kernel: mdcmd (63026): spinup 2 Mar 12 15:51:35 Tower kernel: mdcmd (63027): spinup 3 Mar 12 15:51:35 Tower kernel: mdcmd (63028): spinup 4 Mar 12 15:51:35 Tower kernel: mdcmd (63029): spinup 5 Mar 12 15:51:35 Tower kernel: mdcmd (63030): spinup 6 Mar 12 15:51:35 Tower kernel: mdcmd (63031): spinup 7 Mar 12 15:51:42 Tower kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Mar 12 15:51:42 Tower kernel: ata9.00: failed command: IDLE Mar 12 15:51:42 Tower kernel: ata9.00: cmd e3/00:00:00:00:00/00:00:00:00:00/40 tag 0 Mar 12 15:51:42 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Mar 12 15:51:42 Tower kernel: ata9.00: status: { DRDY } Mar 12 15:51:42 Tower kernel: ata9: hard resetting link Mar 12 15:51:45 Tower kernel: ata9: SATA link up 3.0 Gbps (SStatus 123 SControl 0) Mar 12 15:51:45 Tower kernel: ata9.00: configured for UDMA/100 Mar 12 15:51:45 Tower kernel: ata9: EH complete
March 20, 201016 yr Author That disk needs to have a reiserfsck run against it. The file-system has some corruption. Is there an unRAID specific procedure to follow for reiserfsck? Stop the array?
March 20, 201016 yr That disk needs to have a reiserfsck run against it. The file-system has some corruption. Is there an unRAID specific procedure to follow for reiserfsck? Stop the array? No, you do not stop the array. If you did, you would not keep the parity disk in sync with the fixes. The procedure is described in the wiki here: http://lime-technology.com/wiki/index.php?title=Check_Disk_Filesystems Joe L.
March 20, 201016 yr Author Thanks again, Joe L. Those instructions you pointed out warn never to use the --rebuild-tree option, unless reiserfsck says you must. So, of course, that's what I got: 2 found corruptions can be fixed only when running with --rebuild-tree I'll attach dump of the output after I get the family fed.
March 20, 201016 yr Author Also, and I just noticed this before walking away from my desk, /dev/md2 (the drive with the troubles) is showing as unformatted. I swapped out the 400GB drive for a 1TB drive, went through the process of rebuilding the array and formatting the drive and had it show up as having free space. Unless I'm completely delusional, it also looks like a fair number of the files that should be available on the system are now... not available.
March 20, 201016 yr Thanks again, Joe L. Those instructions you pointed out warn never to use the --rebuild-tree option, unless reiserfsck says you must. So, of course, that's what I got: 2 found corruptions can be fixed only when running with --rebuild-tree I'll attach dump of the output after I get the family fed. Did you then run with the rebuild-tree option as instructed? Hopefully before continuing with the other actions you described. Joe L.
March 21, 201016 yr Author Alas, not before the other actions. 200GB of various non-essential stuff went away. I'm not entirely sure what I did wrong, to be honest. I gather formatting was wrong, although it was a new, non-pre-cleared drive in the right device slot. In any case, nothing vital was lost, the original errors seem to have gone away and I appear to have a stable system again. About the only thing that seems out of alignment is the "non-inet addr" warning.
March 21, 201016 yr Alas, not before the other actions. 200GB of various non-essential stuff went away. I'm not entirely sure what I did wrong, to be honest. I gather formatting was wrong, although it was a new, non-pre-cleared drive in the right device slot. In any case, nothing vital was lost, the original errors seem to have gone away and I appear to have a stable system again. About the only thing that seems out of alignment is the "non-inet addr" warning. A new pre-cleared disk is only partitioned. It has no file-system, everything except the MBR is all zeros. attempting to use reiserfsck on it would be impossible and it certainly would not ask you to use rebuild-tree. Therefore, the rebuild-tree would have been for a different drive. I am not sure you have resolved all the issues with your hardware. I'm glad you did not lose anything important.
Archived
This topic is now archived and is closed to further replies.