trurl Posted June 8, 2015 Share Posted June 8, 2015 Is the parity check progressing at a reasonable speed? If you have any more problems be sure to get a syslog. Quote Link to comment
JP Posted June 8, 2015 Author Share Posted June 8, 2015 Is the parity check progressing at a reasonable speed? If you have any more problems be sure to get a syslog. I think so. 92 Mb/Sec at the moment. I certainly hope I can grab the syslog, but within the 4 previous times it failed this past week it truly froze and I couldn't get to anything. Once I rebooted the syslog appeared fresh again. Shouldn't the log files be saved somewhere in the event something like this happens? Quote Link to comment
trurl Posted June 9, 2015 Share Posted June 9, 2015 Is the parity check progressing at a reasonable speed? If you have any more problems be sure to get a syslog. I think so. 92 Mb/Sec at the moment. I certainly hope I can grab the syslog, but within the 4 previous times it failed this past week it truly froze and I couldn't get to anything. Once I rebooted the syslog appeared fresh again. Shouldn't the log files be saved somewhere in the event something like this happens? Go ahead and get a syslog now and post it. It may or may not tell us anything. Quote Link to comment
JP Posted June 9, 2015 Author Share Posted June 9, 2015 Go ahead and get a syslog now and post it. It may or may not tell us anything. It is attached. 42% done and seems to be working fine, but for whatever reason I never had a problem with parity checks. It would freeze afterwards. There are 128 sync errors and that seems very atypical when I think historically. Since the freezes I've had a ton of sync errors whereas before I would rarely have them and if I did there were only one or two. I don't know if that information means anything, but I just thought I would mention it. syslog3.txt Quote Link to comment
trurl Posted June 9, 2015 Share Posted June 9, 2015 Go ahead and get a syslog now and post it. It may or may not tell us anything. It is attached. 42% done and seems to be working fine, but for whatever reason I never had a problem with parity checks. It would freeze afterwards. There are 128 sync errors and that seems very atypical when I think historically. Since the freezes I've had a ton of sync errors whereas before I would rarely have them and if I did there were only one or two. I don't know if that information means anything, but I just thought I would mention it. I typically have zero, and I think any at all should be investigated. An unclean shutdown is a likely cause. In fact, this is why an unclean shutdown automatically starts a correcting parity check. Your syslog shows that you have booted in SAFE mode, so that is working now. And it shows it started a correcting parity check due to an unclean shutdown, followed by a lot of messages about correcting parity. So, all seems to be as expected for now. Quote Link to comment
JP Posted June 9, 2015 Author Share Posted June 9, 2015 Ugh...Safe Mode is exhibiting the same behavior. The parity check completed just fine with the same 128 sync errors. All drives were green with no errors. I first tried transferring a 2 gig file to the server. It went fine and the speeds were what I would expect (~28 megs/sec). I then transferred about 6 gigs of smaller size files. It also went through just fine. Then I tried about 20 gigs of different file sizes (500 meg - 2 gig). About 1/10 of the way through the transfer the server locked up again. - The transfer could not continue. - I had no access to the web interface page. - I had no access to the syslog. - The monitor connected to the server lost signal so there was no information there. - Fans are still spinning and the Server is powered on. - The HDD activity light on the front of the server is doing nothing. - I cannot telnet in to the server. Any thoughts on how to narrow down the culprit from here? Quote Link to comment
trurl Posted June 9, 2015 Share Posted June 9, 2015 Beginning to sound like a hardware problem. What is the exact model of your power supply? Did you ever take the time to run an extended memory test? Quote Link to comment
dgaschk Posted June 9, 2015 Share Posted June 9, 2015 Run memtest at least overnight. Quote Link to comment
JP Posted June 9, 2015 Author Share Posted June 9, 2015 A sincere thanks for the help thus far. I was able to walk my wife through starting the MEMTEST. I'll just have it run for the next day or two and see what it comes back with. If the MEMTEST reports a failure does that provide definitive evidence that an issue resides in the RAM or could it say possibly be in the motherboard or CPU? I'll circle back with what the power supply is once I get home from work. It has been a few years since this server was built and has held up pretty well until now. Quote Link to comment
JP Posted June 10, 2015 Author Share Posted June 10, 2015 The power supply appears to be an Antec Neo Eco 400C. I guess it could be the issue here, but it has powered this server for five years ok. The past 3 have been with the same drives that are in there now. The MEMTEST has gone for 12 hours now with no errors. I'm about to leave to go out of town from Wednesday to Sunday. Should I just let the MEMTEST run the whole time I'm gone or should I have someone stop by to stop it after a day or two? There won't be much troubleshooting I'll be able to perform until Sunday. Quote Link to comment
dgaschk Posted June 10, 2015 Share Posted June 10, 2015 Jun 8 18:45:57 Tower kernel: ata3: SError: { UnrecovData HostInt 10B8B BadCRC } 10B8B is a SATA path error. Bad or loose SATA cable or a bad or dirty SATA port. Quote Link to comment
JP Posted June 10, 2015 Author Share Posted June 10, 2015 10B8B is a SATA path error. Bad or loose SATA cable or a bad or dirty SATA port. Wow, I'm assuming the syslog doesn't tell us which drive...correct? Quote Link to comment
dgaschk Posted June 10, 2015 Share Posted June 10, 2015 ~/Downloads/syslog3.txt:587: Jun 8 18:31:56 Tower kernel: ata3.00: ATA-8: SAMSUNG HD203WI, S1UYJ1RZ525479, 1AN10003, max UDMA/133 Quote Link to comment
JP Posted June 10, 2015 Author Share Posted June 10, 2015 ~/Downloads/syslog3.txt:587: Jun 8 18:31:56 Tower kernel: ata3.00: ATA-8: SAMSUNG HD203WI, S1UYJ1RZ525479, 1AN10003, max UDMA/133 Again, wow, and thank you very much. I won't be able to do anything until Sunday since I'm on my way out of town, but I'll replace the SATA cable and see if that makes a difference. Quote Link to comment
JP Posted June 15, 2015 Author Share Posted June 15, 2015 I just got back from being out of town. The memtest ran for 5+ days while I was gone without any errors. See below: I did change the SATA cable for the Samsung drive (the only Samsung drive in the array). When I started the server again I didn't put it in to Safe Mode because I didn't catch it in time as it moved past the screen. Again, the same issue (freezes) exists with or without Safe Mode so I wouldn't think the plugins are the source of the problem. I started the array and it began to do a parity check. It didn't take long and as soon as a file started getting saved to the server it froze up again and I had to manually shut it down from the switch on the power supply. Again, no luck at getting to syslog when it freezes. I wish I could. The only other thing I can think of to do right now is change which SATA port the Samsung drive is plugged in to on the motherboard. I'll try that this evening and see if the freezing remains. Quote Link to comment
JP Posted June 16, 2015 Author Share Posted June 16, 2015 Drat...still no joy. Using dgaschk's recommendation I elected to see if I could find any dust or loose sata cables. There was some dust, but I wouldn't consider it excessive. Regardless, I hit it all with a considerable amount of compressed air. I pulled some of the sata cables (specifically, the Samsung drive) and reattached them to ensure they were secure. I then started up the server and it started to do a parity check. It wasn't finding any errors so I elected to stop the parity check early and test how transferring data went. In Safe Mode I probably transferred about 20 gigs. Everything worked fine so I was optimistic. I then elected to reboot and try transfers with the plugins. It hung on attempting to reboot and I had to shut it down manually. Once it came back up and did not see that there was an unclean shutdown, which was good. With the plugins enabled everything worked fine for a while, but then the issue resurfaced and the server froze again. As per usual I could not get to the syslog and just before this the drives reflected no errors and were all green. Since dgaschk noticed the serror with the Samsung drive I thought I should try to change SATA ports and see if that makes any difference. I'm 99% sure this is the case, but if I don't ask I'll manage to screw something up, that is, all the drives simply need to be in the same location (drive 1, drive 2, etc.) as before...correct? As you can see from the screenshot below they are, but the drive labels have changed. SDC for Disk 1 (Samsung) is now SDB (Samsung). This is normal and expected since I've changed SATA ports...correct? Those labels can change, but the actual locations of the drives must remain the same as before or the array won't work, that is, Disk 1 for the Samsung drive must still be Disk 1 for the Samsung. Again, I would just like to be sure before starting the array again. I haven't had to change SATA ports in 5 years. Quote Link to comment
dgaschk Posted June 16, 2015 Share Posted June 16, 2015 The sdX assignments can change every boot. They have nothing to do with physical connections. The data drives can be in any order. Only the parity drive must be in the correct slot. There is a Log button in the upper right corner of the unRAID webGUI. First make a copy of the current log (syslog.txt) then click on the Log button. Leave the Log window open until the server crashes. Copy and paste the entire contents of the log window into a second text file (syslogB.txt). Attach both syslog.txt and syslogB.txt to a post. Quote Link to comment
JP Posted June 16, 2015 Author Share Posted June 16, 2015 Thanks. This was in safe mode. I started the array and a parity check began. I cancelled the parity check after about 30 minutes. There were no sync errors. A few file transfers went across just fine. Then I tried a 22 gig file. At about 4 gigs the server froze again. No access to the web interface. The monitor connected to the server went dark. The server itself had power and fans were spinning, but there didn't appear to be any hdd activity. The logs were created as you instructed and they are attached. sysloga.txt syslogb.txt Quote Link to comment
dgaschk Posted June 16, 2015 Share Posted June 16, 2015 The cache drive is not configured as AHCI in BIOS. It should be preferably AHCI or else "SATA only". Are you using the cache drive? Fix BIOS and try again. Disable the cache drive and try again. Quote Link to comment
JP Posted June 16, 2015 Author Share Posted June 16, 2015 Thanks. The cache drive is not a SATA drive. It is a PATA / IDE (I think that is what they called it) drive. I don't use the "mover" option. I use the cache drive pretty much for only running plugins since I don't want the rest of the drives spinning all the time. I only see one option for AHCI in the BIOS and it is selected. I'll take a screenshot and pass it on once I test disabling the cache drive to see what happens. Quote Link to comment
dgaschk Posted June 16, 2015 Share Posted June 16, 2015 If it is actually IDE then BIOS is correct. Try copying to a disk share. Quote Link to comment
JP Posted June 16, 2015 Author Share Posted June 16, 2015 I disabled the cache to see if that would provide any new information. However, when attempting to write a 20 gig file directly to Disk 4 the server froze again. I did start the log file just in case it might help. It is below. Rebooting now to try the same file with Disk 3, 2, and 1 to see if there is any difference. /usr/bin/tail -f /var/log/syslog Jun 16 18:09:58 Tower avahi-daemon[1521]: Files changed, reloading. Jun 16 18:09:58 Tower avahi-daemon[1521]: Service group file /services/smb.service changed, reloading. Jun 16 18:09:58 Tower emhttp: shcmd (108): ps axc | grep -q rpc.mountd Jun 16 18:09:58 Tower emhttp: Restart NFS... Jun 16 18:09:58 Tower emhttp: shcmd (109): exportfs -ra |& logger Jun 16 18:09:58 Tower emhttp: shcmd (110): /usr/local/sbin/emhttp_event svcs_restarted Jun 16 18:09:58 Tower emhttp_event: svcs_restarted Jun 16 18:09:58 Tower emhttp: shcmd (111): /usr/local/sbin/emhttp_event started Jun 16 18:09:58 Tower emhttp_event: started Jun 16 18:09:58 Tower avahi-daemon[1521]: Service "Tower" (/services/smb.service) successfully established. Quote Link to comment
JP Posted June 17, 2015 Author Share Posted June 17, 2015 Well, Disk 1 and Disk 4 failed the 20 gig transfer. The log file doesn't show any new information. It is so odd. None of the drives show any errors in the web interface. I'm not sure what to narrow this down to. Quote Link to comment
dgaschk Posted June 17, 2015 Share Posted June 17, 2015 See check disk filesystems in my sig. Quote Link to comment
JP Posted June 17, 2015 Author Share Posted June 17, 2015 Thanks again. So I should run reiserfsck for all data drives and the cache drive...correct? I'm running unRAID 5.0.5. This is after unRAID v5.0-beta8d...correct? I'm a Linux idiot so if I don't ask I'll screw it up. In the wiki it uses /dev/md1 as the example. If I want to reiserfsck data drive 1 do I enter the same thing or should the syntax be different? Typically, I associate sdb with drive 1. The directions say not to, but what would be the verbiage to reiserfsck the parity drive? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.