June 2, 20206 yr Just recently installed and configure UNRAID. So far I am happy. I am in middle of a large hard drive copy job so I can'r reboot my box but out of nowhere I stared getting these UPS /USB errors. It has been working fine for few days. Any idea where do I start to troubleshoot? Diagnostic Attached Please: [239484.482949] usb 1-1.1: USB disconnect, device number 3 [239484.654891] usb 1-1.1: new full-speed USB device number 6 using ehci-pci [239484.740036] hid-generic 0003:0764:0501.0004: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0 [239486.274950] usb 1-1.1: USB disconnect, device number 6 [239486.447874] usb 1-1.1: new full-speed USB device number 7 using ehci-pci [239486.531995] hid-generic 0003:0764:0501.0005: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0 [239662.915026] usb 1-1.1: USB disconnect, device number 7 [239663.088714] usb 1-1.1: new full-speed USB device number 8 using ehci-pci [239663.174138] hid-generic 0003:0764:0501.0006: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0 [239666.243030] usb 1-1.1: USB disconnect, device number 8 [239666.415721] usb 1-1.1: new full-speed USB device number 9 using ehci-pci [239666.501062] hid-generic 0003:0764:0501.0007: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0 [240019.267176] usb 1-1.1: USB disconnect, device number 9 [240019.439389] usb 1-1.1: new full-speed USB device number 10 using ehci-pci [240019.524871] hid-generic 0003:0764:0501.0008: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0 [240021.315170] usb 1-1.1: USB disconnect, device number 10 [240021.487384] usb 1-1.1: new full-speed USB device number 11 using ehci-pci [240021.572741] hid-generic 0003:0764:0501.0009: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0 [241831.858564] mce: [Hardware Error]: Machine check events logged [241831.858570] mce: [Hardware Error]: Machine check events logged [243974.212849] usb 1-1.1: USB disconnect, device number 11 [243974.386787] usb 1-1.1: new full-speed USB device number 12 using ehci-pci [243974.471684] hid-generic 0003:0764:0501.000A: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0 [243976.004825] usb 1-1.1: USB disconnect, device number 12 [243976.177773] usb 1-1.1: new full-speed USB device number 13 using ehci-pci [243976.263115] hid-generic 0003:0764:0501.000B: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0 [244848.453167] usb 1-1.1: USB disconnect, device number 13 [244848.625424] usb 1-1.1: new full-speed USB device number 14 using ehci-pci [244848.710664] hid-generic 0003:0764:0501.000C: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0 [244944.453220] usb 1-1.1: USB disconnect, device number 14 [244944.626880] usb 1-1.1: new full-speed USB device number 15 using ehci-pci [244944.712224] hid-generic 0003:0764:0501.000D: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0 [246459.461851] usb 1-1.1: USB disconnect, device number 15 [246459.633977] usb 1-1.1: new full-speed USB device number 16 using ehci-pci [246459.719298] hid-generic 0003:0764:0501.000E: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0 [246461.509855] usb 1-1.1: USB disconnect, device number 16 [246461.681991] usb 1-1.1: new full-speed USB device number 17 using ehci-pci [246461.767208] hid-generic 0003:0764:0501.000F: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0 [246757.189978] usb 1-1.1: USB disconnect, device number 17 [246766.065367] usb 1-1.1: new full-speed USB device number 18 using ehci-pci [246766.150629] hid-generic 0003:0764:0501.0010: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0 [247004.742083] usb 1-1.1: USB disconnect, device number 18 [247004.915644] usb 1-1.1: new full-speed USB device number 19 using ehci-pci [247005.000994] hid-generic 0003:0764:0501.0011: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0 [247502.150303] usb 1-1.1: USB disconnect, device number 19 [247502.323180] usb 1-1.1: new full-speed USB device number 20 using ehci-pci [247502.409008] hid-generic 0003:0764:0501.0012: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0 [247922.758462] usb 1-1.1: USB disconnect, device number 20 [247922.932383] usb 1-1.1: new full-speed USB device number 21 using ehci-pci [247923.017295] hid-generic 0003:0764:0501.0013: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0 [247924.806468] usb 1-1.1: USB disconnect, device number 21 [247924.979399] usb 1-1.1: new full-speed USB device number 22 using ehci-pci [247925.064851] hid-generic 0003:0764:0501.0014: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0 [248787.270822] usb 1-1.1: USB disconnect, device number 22 [248787.443533] usb 1-1.1: new full-speed USB device number 23 using ehci-pci [248787.528745] hid-generic 0003:0764:0501.0015: hiddev96,hidraw0: USB HID v1.10 Device [CPS CST135XLU] on usb-0000:00:16.0-1.1/input0 nas-unraid-diagnostics-20200602-1305.zip Edited June 3, 20206 yr by johnwhicker
June 3, 20206 yr Those USB's suggest poor connections at the motherboard / UPS for the cable. Also try a different port on the server More importantly though, in the middle of all that happening you also have Jun 2 10:55:15 NAS-UNRAID kernel: mce: [Hardware Error]: Machine check events logged You should install mcetools via NerdPack and then at the terminal, post the output of mcelog
June 3, 20206 yr Author 5 hours ago, Squid said: Those USB's suggest poor connections at the motherboard / UPS for the cable. Also try a different port on the server More importantly though, in the middle of all that happening you also have Jun 2 10:55:15 NAS-UNRAID kernel: mce: [Hardware Error]: Machine check events logged You should install mcetools via NerdPack and then at the terminal, post the output of mcelog I appreciate it and someone pointed me to that as well. Looks like an ECC ram correction error. I will do a full and extensive memtest after I finish this long array copy job. Here is the mcelog output. Any other ideas? root@NAS-UNRAID:~# mcelog Hardware event. This is not a software error. MCE 0 CPU 0 BANK 5 ADDR 22f43abc0 TIME 1591113315 Tue Jun 2 10:55:15 2020 MCG status: MCi status: Corrected error Error enabled MCi_ADDR register valid MCA: MEMORY CONTROLLER RD_CHANNEL1_ERR Transaction: Memory read error STATUS 9400004000910091 MCGSTATUS 0 MCGCAP 806 APICID 0 SOCKETID 0 MICROCODE 12d CPUID Vendor Intel Family 6 Model 77 Hardware event. This is not a software error. MCE 1 CPU 1 BANK 5 ADDR 22f43abc0 TIME 1591113315 Tue Jun 2 10:55:15 2020 MCG status: MCi status: Corrected error Error enabled MCi_ADDR register valid MCA: MEMORY CONTROLLER RD_CHANNEL1_ERR Transaction: Memory read error STATUS 9400004000910091 MCGSTATUS 0 MCGCAP 806 APICID 2 SOCKETID 0 MICROCODE 12d CPUID Vendor Intel Family 6 Model 77 root@NAS-UNRAID:~# Edited June 3, 20206 yr by johnwhicker
Archived
This topic is now archived and is closed to further replies.