Dear All,
Could someone kindly take a look into my issues. I am trying to work on understanding it before posting. But honestly having a little trouble. So here I am earlier than I would have liked.
Error LOGS that seemed interesting.
Dec 18 09:22:29 DATTOWER kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20220331/dswload2-477)
Dec 18 09:22:29 DATTOWER kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20220331/psparse-529)
Dec 18 09:13:03 DATTOWER kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20220331/dsfield-184)
Dec 18 09:13:03 DATTOWER kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20220331/dswload2-477)
Dec 18 09:13:03 DATTOWER kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20220331/psparse-529)
Dec 18 09:13:03 DATTOWER kernel: nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
Dec 18 09:13:03 DATTOWER kernel: nvidia-uvm: Loaded the UVM driver, major device number 238.
Dec 18 09:06:06 DATTOWER kernel: x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
Dec 18 09:06:06 DATTOWER kernel: DMI: ASUSTeK COMPUTER INC. System Product Name/Pro WS W680-ACE IPMI, BIOS 2703 08/11/2023
Dec 18 09:06:06 DATTOWER kernel: ACPI: Early table checksum verification disabled
Dec 18 09:06:06 DATTOWER kernel: PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
Dec 18 09:06:06 DATTOWER kernel: acpi PNP0A08:00: FADT indicates ASPM is unsupported, using BIOS configuration
Downgrading the GPU Driver to Production (v535.146.02) AND shifting macvlan>ipvlan in Docker settings, seemed to help, running a little more reliably. Though it has only started getting unusable in the last 24 hours, maybe too soon to say.
System will hang with a few CPU Cores pinned. Shutting down docker and attempting to stop array seems to be of no benefit. Manual reboot required. System does not respond to gentle shutdown signals.
What do we need to diagnose this?
Kindests,
Dan