April 3, 201214 yr Hi everyone! My server has been up for about 2 years now. I've had no problems whatsoever until several weeks ago when I notice that the server was inaccessible via the web interface via tower/main.htm. Later, all my mapped network drives to the tower won't connect. Not even connections via telnet were successful. Only a hard reboot would get it to work again and it takes about a week before the whole thing repeats itself and becomes disconnected again. I've attached my syslog in the hopes that some kind soul could help me figure out the root cause of my problem. Many thanks to the Unraid community. syslog.txt
April 3, 201214 yr Notice you must have some parity errors: Apr 2 22:41:12 Tower kernel: md: parity incorrect: 730857472 Run & post smart reports for your drives. Edit: Not nearly as important, and can be fixed later...but you also have 2 ports set as "ide" not ahci Apr 2 18:37:08 Tower emhttp: pci-0000:00:14.1-ide-1:0 ide1 (hdc) ST32000542AS_6XW07FLL Apr 2 18:37:08 Tower emhttp: pci-0000:00:14.1-ide-1:1 ide1 (hdd) ST32000542AS_5XW050P8
April 4, 201214 yr Author Hello Bryan.. I ran another syslog tonight and I'm attaching it here together with my smart reports. Looks like I have more parity errors although these could be from shutting off the server without a proper shutdown command. Thanks for the help you're giving me. syslog-04-03-2012-latest.txt smart_report_combined.txt
April 4, 201214 yr Nothing glaring with your disks (thanks for pointing out your parity drive). You had a few single CRC errors (cabling likely), but otherwise clean. Run a correcting parity calc to remove the errors. Also noticed you lost your ethernet connection during the last parity check, do you know the reason? Apr 3 17:37:04 Tower kernel: md: parity incorrect: 3445240056 [color=red]Apr 3 17:58:02 Tower kernel: r8169: eth0: link down Apr 3 17:58:02 Tower ifplugd(eth0)[1508]: Link beat lost. Apr 3 17:58:09 Tower kernel: r8169: eth0: link up Apr 3 17:58:10 Tower ifplugd(eth0)[1508]: Link beat detected.[/color] Apr 3 18:30:58 Tower kernel: md: parity incorrect: 3596126248 Also, which ports are these drives connected to? Apr 2 22:49:20 Tower kernel: hdc: ST32000542AS, ATA DISK drive Apr 2 22:49:20 Tower kernel: hdd: ST32000542AS, ATA DISK drive
April 4, 201214 yr Author The lost internet connection is the main issue that I'm trying to resolve. I couldn't point to anything that could happened to cause this. I lose connection every so often when I'm accessing my server, whether copying/writing a file, browsing through mapped directories or watching a movie on my popcorn hour. Could this be due to a bad NIC card?
April 6, 201214 yr Author So far, I've checked and replaced the following: Switch Cable NIC Card Sadly, I'm still getting the same problem. What's weird is sometimes, I couldn't browse the tower/main.htm but still be able to view the contents of my mapped drives to the unraid. Other times it's backwards, wherein, the webpage loads but I can't get to any of my unraid mapped drives. This is truly weird, to say the least.
April 9, 201214 yr What does "ethtool eth0" show? If you run the command in a telnet window the text can be copied and pasted.
April 9, 201214 yr 23399 DROPPED packets pretty much tells you the link is not working as expected. There should be NO dropped packets.
April 9, 201214 yr Author I've changed my Switch, NIC Card and cable. Is there anything that I need to look into?
April 10, 201214 yr Author dgaschk: Thanks for the tip on how to copy/paste... Here's what ethtool eth0 says: Linux 2.6.32.9-unRAID. root@Tower:~# ethtool eth0 Settings for eth0: Supported ports: [ TP MII ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Half 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: MII PHYAD: 0 Transceiver: internal Auto-negotiation: on Supports Wake-on: pumbg Wake-on: g Current message level: 0x00000033 (51) Link detected: yes
April 10, 201214 yr I've changed my Switch, NIC Card and cable. Is there anything that I need to look into? Did this fix it? What does ifconfig show now?
April 10, 201214 yr Author Sorry... the ifconfig that I posted awhile back is AFTER I did all the changes... and the problem is still there... :'( I'm now contemplated redoing my 2 year old server. The only question is, where do I put my files in the meantime.
April 23, 201214 yr Author I have since installed unMenu and looking at the syslog through it shows a lot more errors than I was hoping for. Still couldn't make heads or tails out of it but here's hoping someone smarter could help me figure this out. I'm getting lots of "Buffer I/O error on device sdi". syslog-2012-04-22.txt
April 23, 201214 yr Author That's the thing... looking through the list of devices on my server, I don't see a drive that's named "sdi".
April 23, 201214 yr sdi is a 100MB disk or flash? Apr 22 18:43:51 Tower kernel: sd 7:0:0:0: [sdi] 206080 512-byte logical blocks: (105 MB/100 MiB) Apr 22 18:43:51 Tower kernel: sd 7:0:0:0: [sdi] Write Protect is off Apr 22 18:43:51 Tower kernel: sd 7:0:0:0: [sdi] Mode Sense: 03 00 00 00 Apr 22 18:43:51 Tower kernel: sd 7:0:0:0: [sdi] Assuming drive cache: write through Apr 22 18:43:51 Tower kernel: sd 7:0:0:0: [sdi] Assuming drive cache: write through Apr 22 18:43:51 Tower kernel: sdi: sdi1 Apr 22 18:43:51 Tower kernel: sd 7:0:0:0: [sdi] Assuming drive cache: write through Apr 22 18:43:51 Tower kernel: sd 7:0:0:0: [sdi] Attached SCSI removable disk Apr 22 18:43:51 Tower kernel: usb 6-3: USB disconnect, address 4 Apr 22 18:43:51 Tower kernel: ohci_hcd 0000:00:13.1: dev 3 ep1in scatterlist error -108/-108 Apr 22 18:43:51 Tower kernel: sd 7:0:0:0: [sdi] Unhandled error code Apr 22 18:43:51 Tower kernel: sd 7:0:0:0: [sdi] Result: hostbyte=0x07 driverbyte=0x00 Apr 22 18:43:51 Tower kernel: sd 7:0:0:0: [sdi] CDB: cdb[0]=0x28: 28 00 00 00 00 60 00 00 f0 00
Archived
This topic is now archived and is closed to further replies.