Jump to content

rainmanjam

Members
  • Posts

    20
  • Joined

  • Last visited

Posts posted by rainmanjam

  1. 3 hours ago, ich777 said:

    Are you sure that your power supply is up to the task since most of the times it has to do with the power supply.

    Do you have a display connected to actually see what's going on? It would be really cool if you have a display connected and you could take a picture what's happening on screen when it actually crashes.

     

    I assume your machine is not automatically restarting?

     

    I see nothing obvious from your syslog, the driver loads fine and it should in theory be working.

    When I was hands on and watched it happen, the power just cut out and restarted the server

    The power supply was fine. The power CABLE doesn't meet the specs for it. 

    https://www.evga.com/support/faq/FAQdetails.aspx?faqid=59690

    Waiting on a 12 AWG cable to arrive. Sometimes you just need to get your eyes on it to figure out what's going on.

    • Like 1
  2. 30 minutes ago, ich777 said:

    Are you sure that your power supply is up to the task since most of the times it has to do with the power supply.

    Do you have a display connected to actually see what's going on? It would be really cool if you have a display connected and you could take a picture what's happening on screen when it actually crashes.

     

    I assume your machine is not automatically restarting?

     

    I see nothing obvious from your syslog, the driver loads fine and it should in theory be working.

    No display connected so I can't see what's going on. I can connect one up.
    I tailed the syslog via SSH but nothing stands out before crashing.
    I have a 1200w power supply.
     

  3. I'm having an issue where, consistently, when I use Nvidia drivers to do anything like LLM or even aHshcat for testing, Unraid crashes and I have to restart.
     

    root@Tower:~# nvidia-smi -l
    Wed Mar  6 11:26:35 2024
    +-----------------------------------------------------------------------------------------+
    | NVIDIA-SMI 550.40.07              Driver Version: 550.40.07      CUDA Version: 12.4     |
    |-----------------------------------------+------------------------+----------------------+
    | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
    |                                         |                        |               MIG M. |
    |=========================================+========================+======================|
    |   0  NVIDIA GeForce RTX 3090        Off |   00000000:65:00.0 Off |                  N/A |
    |  0%   62C    P0            119W /  420W |       0MiB /  24576MiB |      0%      Default |
    |                                         |                        |                  N/A |
    +-----------------------------------------+------------------------+----------------------+


     

    tower-diagnostics-20240306-1128.zip

×
×
  • Create New...