Jump to content

drawde

Members
  • Content Count

    320
  • Joined

  • Last visited

Community Reputation

0 Neutral

About drawde

  • Rank
    Advanced Member

Converted

  • Gender
    Undisclosed

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. hello all, had unraid reboot on me randomly today. i have a script that copies the log to another location every 5 minutes, didn't see anything in there. when it came back online, Fix Common Problems reported an issue: Your server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the unRaid forums. The output of mcelog (if installed) has been logged. I didn't have mcelog installed, so I installed it. But I guess this only logs going forward? In any case, in my logs from after the reboot (before mcelog was installed) showed the following message: May 28 00:16:00 Tower kernel: smpboot: CPU0: AMD Ryzen 7 1700 Eight-Core Processor (family: 0x17, model: 0x1, stepping: 0x1) May 28 00:16:00 Tower kernel: Performance Events: Fam17h core perfctr, AMD PMU driver. May 28 00:16:00 Tower kernel: ... version: 0 May 28 00:16:00 Tower kernel: ... bit width: 48 May 28 00:16:00 Tower kernel: ... generic registers: 6 May 28 00:16:00 Tower kernel: ... value mask: 0000ffffffffffff May 28 00:16:00 Tower kernel: ... max period: 00007fffffffffff May 28 00:16:00 Tower kernel: ... fixed-purpose events: 0 May 28 00:16:00 Tower kernel: ... event mask: 000000000000003f May 28 00:16:00 Tower kernel: rcu: Hierarchical SRCU implementation. May 28 00:16:00 Tower kernel: smp: Bringing up secondary CPUs ... May 28 00:16:00 Tower kernel: x86: Booting SMP configuration: May 28 00:16:00 Tower kernel: .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 May 28 00:16:00 Tower kernel: mce: [Hardware Error]: Machine check events logged May 28 00:16:00 Tower kernel: mce: [Hardware Error]: CPU 13: Machine Check: 0 Bank 5: bea0000000000108 May 28 00:16:00 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffff81654a1a MISC d012000101000000 SYND 4d000000 IPID 500b000000000 May 28 00:16:00 Tower kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1559016932 SOCKET 0 APIC d microcode 8001126 May 28 00:16:00 Tower kernel: #14 #15 May 28 00:16:00 Tower kernel: smp: Brought up 1 node, 16 CPUs Is the zenstates fix still required for 6.7? I have my C-states in BIOS disabled already. tower-diagnostics-20190528-0431.zip
  2. how concerned should i be? just upgraded to 6.7 a few hours ago, not sure if related but saw these errors in my syslog after CA updated my dockers (usual on sunday AM). May 12 00:19:32 Tower rc.diskinfo[4610]: SIGHUP received, forcing refresh of disks info. May 12 00:19:32 Tower kernel: EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load. May 12 00:19:32 Tower kernel: Either enable ECC checking or force module loading by setting 'ecc_enable_override'. May 12 00:19:32 Tower kernel: (Note that use of the override may cause unknown side effects.) May 12 00:19:37 Tower kernel: sd 13:0:0:0: [sdb] tag#4 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00 May 12 00:19:37 Tower kernel: sd 13:0:0:0: [sdb] tag#4 CDB: opcode=0x28 28 00 e8 e0 88 00 00 00 08 00 May 12 00:19:37 Tower kernel: print_req_error: I/O error, dev sdb, sector 3907028992 May 12 00:19:37 Tower kernel: sd 13:0:3:0: [sde] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00 May 12 00:19:37 Tower kernel: sd 13:0:3:0: [sde] tag#1 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00 May 12 00:19:37 Tower kernel: print_req_error: I/O error, dev sde, sector 7814036992 May 12 00:19:41 Tower kernel: sd 13:0:1:0: [sdc] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00 May 12 00:19:41 Tower kernel: sd 13:0:1:0: [sdc] tag#2 CDB: opcode=0x88 88 00 00 00 00 01 d1 c0 be 00 00 00 00 08 00 00 May 12 00:19:41 Tower kernel: print_req_error: I/O error, dev sdc, sector 7814036992 May 12 00:19:46 Tower kernel: sd 13:0:11:0: [sdm] tag#23 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x00 May 12 00:19:46 Tower kernel: sd 13:0:11:0: [sdm] tag#23 CDB: opcode=0x28 28 00 e8 e0 88 00 00 00 08 00 May 12 00:19:46 Tower kernel: print_req_error: I/O error, dev sdm, sector 3907028992 May 12 00:19:46 Tower rc.diskinfo[4610]: SIGHUP received, forcing refresh of disks info. May 12 00:19:46 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:46 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:46 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:46 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:46 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:46 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:46 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:46 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:46 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:46 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:46 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:46 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:46 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:46 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:47 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:47 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:47 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:47 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:47 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:47 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:47 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:47 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:47 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:47 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:47 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:47 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:47 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:47 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:47 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:47 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:47 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:47 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:47 Tower rc.diskinfo[4610]: SIGHUP received, forcing refresh of disks info. May 12 00:19:47 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:47 Tower rc.diskinfo[4610]: SIGHUP ignored - already refreshing disk info. May 12 00:19:48 Tower rc.diskinfo[4610]: SIGHUP received, forcing refresh of disks info. the webUI does not show any errors though next to any of the disks and all disks are green. the rc.diskinfo and the ECC disabled i've seen before. I don't have ECC ram so i dont think that's something i should care about, and the rc.diskinfo i believe is just the preclear plugin, which i'm not running so i'm not too worried about. mostly the print_req_error I/O error i'm concerned about. it's the same 2 sectors across 4 different drives? May 12 00:19:37 Tower kernel: print_req_error: I/O error, dev sdb, sector 3907028992 May 12 00:19:46 Tower kernel: print_req_error: I/O error, dev sdm, sector 3907028992 May 12 00:19:37 Tower kernel: print_req_error: I/O error, dev sde, sector 7814036992 May 12 00:19:41 Tower kernel: print_req_error: I/O error, dev sdc, sector 7814036992 tower-diagnostics-20190512-0508.zip
  3. Downloaded coppit's plg from github, edited the fields SimonF mentioned, copied the file to /boot/config/plugins and installed the plg via the webUI and snmp is working again on 6.7 (stable). I didn't install netsnmp prior as mentioned, i figured the plg would install it which it did. Haven't tried rebooting and see if it installs on boot yet though (but i don't see why it wouldn't?) If anything i can just install the new plg manually again.
  4. i thought something was wrong with my plex install since that was the docker that i kept having to restart. i wiped out all the databases and metadata and had it rescrape/rebuild everything. Doesn't seem to have solved the issue. i've also disabled c-states in my bios since apparently that's a thing for gen 1 ryzen. i'm starting to think that high cpu usage overall is causing docker to hang/become unresponsive at times. when my CPU utlization is high (nzbget downloading/unpacking something while plex is transcoding), all my dockers slow down almost to a halt. oddly i didn't have this issue on my phenom II (passmark 3.5k) compared to the ryzen which is 13.7k. to add some additional information, although not sure if important, i monitor my system with observium and the SNMP plugin, just to output a graph to a small status page i have. During these "lockups" there are gaps in data, i'm guessing because either the observium docker or SNMP plugin (or both) stops working along with most of my other dockers. when CPU goes down it will typically start again. no errors or anything in the syslog during these times. example: any thoughts? could it be the I/O that is causing issues? should i move my appdata to an SSD?
  5. no, i have mover currently only running once per month. been seeing these lockups every few days.
  6. hey all. i recently upgraded my CPU, mobo, ram, and SATA card(s) (basically almost everything aside from PSU and HDDs) from an old AMD Phenom II based system, with a jumble of random RAID/SATA controllers. It was working fine and was stable with the old hardware, but just felt like it was time for an upgrade. Every couple of days I'm seeing issues where CPU utilization will spike up and load averages are hitting high double digits (20-40+). Most things I'm running (dockers) still work, albeit very slow. Usually I'm able to resolve it by starting/stopping some of the dockers at random. Eventually I'll just stop/start all of them and that almost always fixes it. The last time this happened, I ran docker stats first and none of them were particularly using a lot of CPU, one of them was at 25% and the rest were at single digits. Restarting that docker resolved the issue, leading me to believe that is where the issue is. I haven't changed or added any dockers since changing hardware. The new hardware should be noticeably better than my old hardware but it seems like it's struggling to keep up with the load at certain times, whereas on the old hardware I've gone months and month without touching it at all. Any thoughts/suggestions? Current hardware: Gigabye AB-350 Gaming 3 AMD Ryzen 7 1700 G.SKILL NT Series 16GB LSI 6Gbps SAS HBA LSI 9201-8i IBM 46M0997 ServeRAID Expansion Adapter 16-Port SAS Expander Norco RPC-4220 case/backplane tower-diagnostics-20190423-1415.zip
  7. where can i set this? under scheduler settings i only have parity check and mover.
  8. There is also the SNMP plugin where you can send temp info to a nms like observium or librenms.
  9. since i wanna say the last 2 times i've upgraded unraid i didn't get any notifications for the unraid plg like normal. has this been removed and we have to manually check for updates now?
  10. drawde

    UPS load @ 50%

    i guess i'll keep the battery and move it to another place just for like my laptop and a couple smaller devices. any recommendations for a little beefier UPS?
  11. drawde

    UPS load @ 50%

    yes definitely the battery is due to be replaced and is actually on order right now. apcupsd is reporting about 6 minutes @ 342 Watt (57.0%). i don't remember what the runtime was when i first got the UPS. i'm assuming i could expect a better runtime with a new battery?
  12. drawde

    UPS load @ 50%

    I have a CyberPower CP1000PFCLCD UPS for my main desktop, unraid server and a network switch. I'm not sure about full load but about half load on my desktop and rebuilding a drive in unraid, my UPS is reporting about 45-50% load and about a 6-8 minute runtime. Is this good enough or is it time to upgrade my UPS? I just need to survive quick power blips and enough time to turn off my desktop and unraid server.
  13. Okay so it sounds l like after the parity check, run an extended SMART test. If it passes it should be OK as long as the read errors don't increment? Or should I RMA regardless?
  14. I tried to run an extended SMART report and it got stuck on 90% for like forever. It's running a parity check, not sure if it has anything to do with it. The drive is still under warranty, is it possible to RMA?