Jump to content

a_bomb

Members
  • Posts

    21
  • Joined

  • Last visited

Posts posted by a_bomb

  1. I upgraded to 6.9.2 recently and started getting a TON of alert notifications around my Tripplite UPS that had no issues before. I did end up correcting my email notifications which seems to have stopped it from happening every second on the server notifications, now I just get them every 5 minutes or so.

     

    Diagnostics attached.

     

    Log is showing:

    pr 16 19:36:24 Server kernel: usb 2-1-port5: disabled by hub (EMI?), re-enabling...
    Apr 16 19:36:24 Server kernel: usb 2-1.5: USB disconnect, device number 124
    Apr 16 19:36:24 Server kernel: usb 2-1.5: new low-speed USB device number 125 using ehci-pci
    Apr 16 19:36:24 Server kernel: hid-generic 0003:09AE:3015.00E8: hiddev96,hidraw3: USB HID v1.10 Device [Tripp Lite TRIPP LITE SMART1500RM2U ] on usb-0000:00:1d.0-1.5/input0
    Apr 16 19:36:30 Server apcupsd[17859]: Communications with UPS restored.

     

    skynet-diagnostics-20210416-1932.zip

  2. I didn't seem to find it there. I've taken out all but 4 sticks just whittling it down until I don't see the errors anymore. Seems it crashed and rebooted again this morning (last log entries before that happened below).


    I went ahead and took out the 1050 just a moment ago as well.
     

    Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
    Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 7: cc0007c000010093
    Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: TSC 93f4c92e09d4
    Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: ADDR 5ae239040
    Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: MISC 403e0486
    Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: PROCESSOR 0:306e4 TIME 1615625437 SOCKET 1 APIC 20
    Mar 13 03:50:37 Skynet kernel: EDAC MC1: 31 CE memory read error on CPU_SrcID#1_Ha#0_Chan#3_DIMM#0 (channel:3 slot:0 page:0x5ae239 offset:0x40 grain:32 syndrome:0x0 -  OVERFLOW area:DRAM err_code:0001:0093 socket:1 ha:0 channel_mask:8 rank:1)
    Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
    Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 7: cc00078000010093
    Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: TSC 93f4c92e889c
    Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: ADDR 5b17cbd80
    Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: MISC 52768086
    Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: PROCESSOR 0:306e4 TIME 1615625437 SOCKET 1 APIC 20
    Mar 13 03:50:37 Skynet kernel: EDAC MC1: 30 CE memory read error on CPU_SrcID#1_Ha#0_Chan#3_DIMM#0 (channel:3 slot:0 page:0x5b17cb offset:0xd80 grain:32 syndrome:0x0 -  OVERFLOW area:DRAM err_code:0001:0093 socket:1 ha:0 channel_mask:8 rank:1)
    Connection reset by 192.168.1.63 port 22

     

  3. Well I got up to 11 hours after doing the above steps and then upgrading to 6.9.1

     

    I was tailing the syslog and got this before it shutdown. Going by the timestamps, it rebooted more than once. This is just what was there on the terminal I had open last night. There seem to be memory errors for sure, but it also looks like it is handling them? I'm not sure how I would go about pulling the exact sticks short of assuming Channel 0 = Channel A etc. or pulling them one by one and checking the log.

     

    I'm thinking about pulling the 1050 as well

     

    Mar 12 00:49:33 Skynet kernel: EDAC MC0: 1 CE memory read error on CPU_SrcID#0_Ha#0_Chan#3_DIMM#0 (channel:3 slot:0 page:0x6a751a offset:0x340 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0093 socket:0 ha:0 channel_mask:8 rank:1)
    Mar 12 01:26:41 Skynet kernel: mce: [Hardware Error]: Machine check events logged
    Mar 12 01:26:41 Skynet kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
    Mar 12 01:26:41 Skynet kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 7: 8c00004000010093
    Mar 12 01:26:41 Skynet kernel: EDAC sbridge MC1: TSC 1a5c2d76dc3d
    Mar 12 01:26:41 Skynet kernel: EDAC sbridge MC1: ADDR 727108f40
    Mar 12 01:26:41 Skynet kernel: EDAC sbridge MC1: MISC 1424a5c86
    Mar 12 01:26:41 Skynet kernel: EDAC sbridge MC1: PROCESSOR 0:306e4 TIME 1615530401 SOCKET 1 APIC 20
    Mar 12 01:26:41 Skynet kernel: EDAC MC1: 1 CE memory read error on CPU_SrcID#1_Ha#0_Chan#1_DIMM#1 (channel:1 slot:1 page:0x727108 offset:0xf40 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0093 socket:1 ha:0 channel_mask:2 rank:5)
    Mar 12 01:34:45 Skynet kernel: mce: [Hardware Error]: Machine check events logged
    Mar 12 01:34:45 Skynet kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
    Mar 12 01:34:45 Skynet kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 7: 8c00004000010093
    Mar 12 01:34:45 Skynet kernel: EDAC sbridge MC1: TSC 1b80e90da98b
    Mar 12 01:34:45 Skynet kernel: EDAC sbridge MC1: ADDR d65da23c0
    Mar 12 01:34:45 Skynet kernel: EDAC sbridge MC1: MISC 425a4686
    Mar 12 01:34:45 Skynet kernel: EDAC sbridge MC1: PROCESSOR 0:306e4 TIME 1615530885 SOCKET 1 APIC 20
    Mar 12 01:34:45 Skynet kernel: EDAC MC1: 1 CE memory read error on CPU_SrcID#1_Ha#0_Chan#3_DIMM#0 (channel:3 slot:0 page:0xd65da2 offset:0x3c0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0093 socket:1 ha:0 channel_mask:8 rank:1)
    Mar 12 01:51:10 Skynet kernel: mce: [Hardware Error]: Machine check events logged
    Mar 12 01:51:10 Skynet kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
    Mar 12 01:51:10 Skynet kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010093
    Mar 12 01:51:10 Skynet kernel: EDAC sbridge MC0: TSC 1dd51b124c58
    Mar 12 01:51:10 Skynet kernel: EDAC sbridge MC0: ADDR 68663a340
    Mar 12 01:51:10 Skynet kernel: EDAC sbridge MC0: MISC 1526a5886
    Mar 12 01:51:10 Skynet kernel: EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1615531870 SOCKET 0 APIC 0
    Mar 12 01:51:10 Skynet kernel: EDAC MC0: 1 CE memory read error on CPU_SrcID#0_Ha#0_Chan#3_DIMM#0 (channel:3 slot:0 page:0x68663a offset:0x340 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0093 socket:0 ha:0 channel_mask:8 rank:1)
    Mar 12 01:59:03 Skynet kernel: mce: [Hardware Error]: Machine check events logged
    Mar 12 01:59:03 Skynet kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
    Mar 12 01:59:03 Skynet kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 7: 8c00004000010093
    Mar 12 01:59:03 Skynet kernel: EDAC sbridge MC1: TSC 1ef3720de11e
    Mar 12 01:59:03 Skynet kernel: EDAC sbridge MC1: ADDR 7289aeb00
    Mar 12 01:59:03 Skynet kernel: EDAC sbridge MC1: MISC 4214e486
    Mar 12 01:59:03 Skynet kernel: EDAC sbridge MC1: PROCESSOR 0:306e4 TIME 1615532343 SOCKET 1 APIC 20
    Mar 12 01:59:03 Skynet kernel: EDAC MC1: 1 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#1 (channel:0 slot:1 page:0x7289ae offset:0xb00 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0093 socket:1 ha:0 channel_mask:1 rank:5)
    Mar 12 02:00:06 Skynet root: /etc/libvirt: 920.4 MiB (965103616 bytes) trimmed on /dev/loop3
    Mar 12 02:00:06 Skynet root: /var/lib/docker: 15.5 GiB (16609398784 bytes) trimmed on /dev/loop2
    Mar 12 02:00:06 Skynet root: /mnt/cache: 191.9 GiB (206013878272 bytes) trimmed on /dev/sdj1
    Connection reset by 192.168.1.63 port 22

     

  4. 30 minutes ago, trurl said:

    Possibly unrelated but your appdata and system shares have files on the array. You should clean that up.

     

    What do you get from the command line with this?

    
    
    ls -lah /mnt/disk23/system

     

    root@Skynet:~# ls -lah /mnt/disk23/system
    total 0
    drwxrwxrwx 3 nobody users  20 Mar  3 21:15 ./
    drwxrwxrwx 8 nobody users 155 Mar  7 23:50 ../
    drwxrwxrwx 2 root   root   24 Mar  3 21:15 docker/

     

    So I should go ahead and move the docker/ back to the cache drive? Or just delete it? Seems there is an appdata folder on disk 23. I could just remove that since all of that data should be on the cache drive.

  5. I have also been having a lot of issues with the 6.9 upgrade.
     

    Cache drive unmountable - Samsung EVO 250GB as a cache drive

     

    I ended up wiping it before seeing the solutions about stopping the array and unmounting/mounting it (lesson learned there). I seem to have got past that and re-created almost everything I needed.

     

    I had a lot of issues with it rebooting on its own as well. Usually after about 30-40 minutes and it seemed like it would just keep doing it until I reverted back to 6.8.3.

     

    I had 6.9 stable for over 7 hours today and then started to bring back my VMs (had to recreate domains share and libvirt folder) and the reboots just started up again which also caused a parity check.

     

    It just did it again, while typing this, only up for 9 minutes before a reboot this time.

     

     

    skynet-diagnostics-20210308-0015.zip

  6. I am trying a similar thing with VMware Workstation
     

    Windows 10 VM > Kali Linux VM and I get

    VMWare Win10-Kali.jpg

     

    I already did these steps after stopping VM's and disabling VM Manager:

    Quote

    First, from command line, type the following:

     

    modprobe -r kvm_intel

     
    Then type this:

    modprobe kvm_intel nested=1

     

  7. If I use the same flash drive, and go through the creation tool again and then import my backup I won't be charged another license fee right? I just used up the usb transfer to change to this smaller drive so the larger one wouldn't accidentally get broken off on the back of the computer. 

  8. I did put it in earlier and let it go through repair disk. I didnt take a backup though. Put the drive back and same thing. 

     

    Just ran the chkdsk on the flash drive and Windows found no problems. Made the backup. Put drive back in server and no change. 

  9. After installing a video card and trying to get it to work on a VM the server shutdown and then wouldn't book back to the GUI. I ran the diagnostics and have attached the zip file to the post. Any help is appreciated. 

     

    It looks like I have nothing in the config files, but I am not sure if that is normal for the diagnostics.

     

     

×
×
  • Create New...