July 10, 20241 yr This maybe happened 1 time every 3 or 4 months, and now seems to be happening nightly even after a clean shutdown and coming back up. My NIC appears to just stop responding to pings etc, even though I can see a link light on the NIC and the switch port as well as activity on the NIC side. I can't ping it. As troubleshooting one of the times during this "fugue" state I power cycled my cable modem and wifi router, then my switch to no avail. I then gracefully power cycle the server and it seemingly comes back up without issue. This last time yesterday a disk came up stating 1 disk was "disabled and currently being emulated" (disk 5) even though SMART says it is good etc. This morning I looked and the server is seemingly locked up, not responding to keyboard inputs but still powered etc (he server is on a UPS and connected via USB) . I've had this server for approximately 10+ years without issue, and this is a fresh mainboard CPU etc from about 4 years ago... Any help is appreciated. Upon boot up this morning I noticed something odd in the screen logs (unraid.4.jpg) - any help is appreciated. unraid-diagnostics-20240710-0911.zip unraid-diagnostics-20240709-1538.zip
July 10, 20241 yr Jul 10 06:02:26 unraid kernel: r8169 0000:08:00.0 eth0: rtl_chipcmd_cond == 1 (loop: 100, delay: 100). Jul 10 06:02:26 unraid kernel: r8169 0000:08:00.0 eth0: rtl_ephyar_cond == 1 (loop: 100, delay: 10). ### [PREVIOUS LINE REPEATED 5 TIMES] ### Jul 10 06:02:26 unraid kernel: r8169 0000:08:00.0 eth0: rtl_eriar_cond == 1 (loop: 100, delay: 100). Looks like a NIC problem, try with an ad-don NIC if available.
July 10, 20241 yr Author Thanks, moderator. I was trying to find a URL for the hw recommendations. I350 Intel chipset, but I'd love some recommendations. I assume the disk disablement is because of this, and I have to do a rebuild once I get the new NIC?
July 10, 20241 yr 8 minutes ago, johnny.ink said: I assume the disk disablement is because of this Possibly note, there were errors with multiple disks at the same time, suggesting a power on controller problem: Jul 9 05:01:24 unraid kernel: md: disk1 read error, sector=2419158984 Jul 9 05:01:24 unraid kernel: md: disk2 read error, sector=2419158984 Jul 9 05:01:24 unraid kernel: md: disk3 read error, sector=2419158984 Any Intel gigabit NIC should be fine.
July 15, 20241 yr Author I installed an Intel NIC this Saturday and all appeared fine, and I also disabled the onboard Braodcom NIC on the mainboard via BIOS, but it appears the issue is still there. At this point I assume the mainboard, but do the logs show anything specific? I looked but couldn't really decipher, so any help is appreciated. yesterday unRAID stayed up for over 24 hours, and I was able to stream from Plex off of it, but I woke up to the unraid GUI being not available and the IP not responding... unraid-diagnostics-20240715-0858.zip
July 15, 20241 yr Author if it happens during the day while I work I'll document approximate time frame and diagnostics
July 16, 20241 yr Author JorgeB - I had the server going yesterday and it stayed up and active and I can verify it was up and active until between 2 am and 8:30 am. I woke up and saw it down. I think the mainboard as 5 of the drives (including the ones listed above) are connected to onboard SATA and 3 drives are on the dell controller is the culprit. Here are the diags - I assume it's the mainboard but if the logs show anything that would be super helpful in limiting the variables. Thanks in advance sir! unraid-diagnostics-20240716-0939.zip
July 16, 20241 yr Solution The syslog-previous doesn't show the beginning of the problem, but it does show this: Jul 16 04:41:40 unraid kernel: md: disk1 read error, sector=2378833488 Jul 16 04:41:40 unraid kernel: md: disk2 read error, sector=2378833488 Jul 16 04:41:40 unraid kernel: md: disk3 read error, sector=2378833488 They are all connected to the onboard SATA controller, together with disk 5 which also appears to have been affected, so possibly a board issue, or if all these disks share anything else, like a power splitter, it could also be that.
July 16, 20241 yr Author Thank you Jorge. I suspected it was the mainboard, but having someone who can read the logs better then I is helpful. Thank you sir.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.