skler

Members
  • Posts

    44
  • Joined

  • Last visited

skler's Achievements

Rookie

Rookie (2/14)

4

Reputation

1

Community Answers

  1. I can do this tomorrow, restoring some old hw. btw thanks you in the meantime
  2. I've tried opensource drivers too [ 256.084894] nvidia-uvm: Loaded the UVM driver, major device number 236. [ 257.340047] NVRM: kgspInitRm_IMPL: unexpected WPR2 already up, cannot proceed with booting gsp [ 257.340054] NVRM: kgspInitRm_IMPL: (the GPU is likely in a bad state and may need to be reset) [ 257.340060] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 257.343106] NVRM: GPU 0000:af:00.0: RmInitAdapter failed! (0x62:0x40:1784) [ 257.345186] NVRM: GPU 0000:af:00.0: rm_init_adapter failed, device minor number 0 do you think that the GPU could be broken? littleboy-diagnostics-20240423-2250.zip
  3. Done, Diagnostic attached (without nvidia.conf params) is the only pcie x16 it should be compatible.. https://docs.nvidia.com/certification-programs/nvidia-certified-systems/index.html the T4 don't have one done oki thanks still no device found littleboy-diagnostics-20240423-2225.zip
  4. Hi All, I've upgraded my old P400 with a T4 but the new card is not recognized. Apr 23 19:08:28 littleboy kernel: NVRM: GPU at PCI:0000:af:00: GPU-36d51216-544d-71c6-0604-11d08f217cd0 Apr 23 19:08:28 littleboy kernel: NVRM: Xid (PCI:0000:af:00): 140, pid='<unknown>', name=<unknown>, An uncorrectable ECC error detected (possible firmware handling failure) DRAM:-1840691974, LTC:0, MMU:0, PCIE:0 Apr 23 19:08:28 littleboy kernel: NVRM: GPU 0000:af:00.0: RmInitAdapter failed! (0x62:0x40:2523) Apr 23 19:08:28 littleboy kernel: NVRM: GPU 0000:af:00.0: rm_init_adapter failed, device minor number 0 Apr 23 19:08:28 littleboy kernel: [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x0000af00] Failed to allocate NvKmsKapiDevice Apr 23 19:08:28 littleboy kernel: [drm:nv_drm_register_drm_device [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x0000af00] Failed to register device [...] Apr 23 19:12:10 littleboy kernel: NVRM: GPU 0000:af:00.0: RmInitAdapter failed! (0x62:0x40:2523) Apr 23 19:12:10 littleboy kernel: NVRM: GPU 0000:af:00.0: rm_init_adapter failed, device minor number 0 Apr 23 19:12:10 littleboy kernel: nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint. Apr 23 19:12:10 littleboy kernel: nvidia-uvm: Loaded the UVM driver, major device number 235. Apr 23 19:12:10 littleboy kernel: NVRM: GPU 0000:af:00.0: RmInitAdapter failed! (0x62:0x40:2523) Apr 23 19:12:10 littleboy kernel: NVRM: GPU 0000:af:00.0: rm_init_adapter failed, device minor number 0 running nvidia-smi command I have this output: # nvidia-smi No devices were found # lsmod | grep nvidia nvidia_uvm 4644864 0 nvidia_drm 90112 0 nvidia_modeset 1347584 1 nvidia_drm nvidia 54116352 2 nvidia_uvm,nvidia_modeset video 61440 1 nvidia_modeset drm_kms_helper 167936 4 mgag200,nvidia_drm drm 499712 6 drm_kms_helper,drm_shmem_helper,nvidia,mgag200,nvidia_drm backlight 20480 3 video,drm,nvidia_modeset i2c_core 86016 9 drm_kms_helper,i2c_algo_bit,igb,nvidia,mgag200,i2c_smbus,i2c_i801,ipmi_ssif,drm I deleted the GPU Stats plugin, reinstalled drivers, and rebooted a couple of times, but the card is not recognized. I read a bit around the forum and I have: The GPU is also recognized in BIOS/iDrac Is it possible to fix? Could be the GPU broken? littleboy-diagnostics-20240417-1506.zip
  5. forced reboot will invalidate the parity? the data from the broken disk, in case of invalid parity, will be lose?
  6. Today I have a broken disk I'm trying to stop the docker service and the array but when I try to stop docker I'm in the following situation: Docker should be down but its status is "running" littleboy-diagnostics-20240417-1506.zip What can I do to replace the disk, or to force to stop the array?
  7. Can I manually change it without breaking everything? So basically I can do step1-3 all together and start the parity check on everything
  8. I would like to reorder disks position, add a second parity disk and add few disks to the UnRaid Array. I have ZFS on all disks. I'm on UnRaid 6.2.10, with a single parity array. Is this the right (and faster) procedure? Step 1 - Rearrange disks Stop array Tools -> New Config (selecting the option to retain all current assignments) Move disks Set Parity as Valid Commit Changes and Start the Array Question 1: My ZFS Pools are called as diskX/share-name, for example: I've disk2/isos in Disk2. If I move this disk to 1 what will happens to the ZFS pool, will be renamed or Disk1 will have the disk2/isos pool? If it will not be renamed automatically, can I do it manually? Step 2 - Add second parity Stop array Add 2nd parity disk Start the array and calculate the parity on Parity Disk 2 Question 2: This will check also pairty on Parity Disk 1? Question 3: When I will have two Parity Disks, the parity calculation will be done on both disks? In case have sense to merge Step 1 and Step 2 and calculate parity on both drivers? Step 3 - Add new disks Preclear new disks Stop array Add new disk Start array (without calculating parity) Question 4: Will be parity2 valid adding new "zeroed" disks to the array?
  9. any update on this?
  10. I didn't test with a reboot, an array stop or neither a docker/vm stop. I can suppose was some dockers that uses this resource. But I've solved with this without shutting down anything: root@littleboy:~# zfs set mountpoint=none disk3/backup root@littleboy:~# zfs destroy -vr disk3/backup will destroy disk3/backup@littleboy_2023-12-19-04:40:40 will destroy disk3/backup@littleboy_2024-01-07-04:41:17 will destroy disk3/backup@littleboy_2024-01-09-04:41:07 will destroy disk3/backup@littleboy_2023-12-21-04:41:06 will destroy disk3/backup@littleboy_2023-12-22-04:41:06 will destroy disk3/backup@littleboy_2024-01-05-04:41:21 will destroy disk3/backup root@littleboy:~#
  11. I can't delete a zfs pool and its snapshots, what can I do? # zfs destroy -vfrR disk3/backup will destroy disk3/backup@littleboy_2023-12-19-04:40:40 will destroy disk3/backup@littleboy_2024-01-07-04:41:17 will destroy disk3/backup@littleboy_2024-01-09-04:41:07 will destroy disk3/backup@littleboy_2023-12-21-04:41:06 will destroy disk3/backup@littleboy_2023-12-22-04:41:06 will destroy disk3/backup@littleboy_2024-01-05-04:41:21 will destroy disk3/backup cannot destroy snapshot disk3/backup@littleboy_2023-12-19-04:40:40: dataset is busy cannot destroy snapshot disk3/backup@littleboy_2024-01-07-04:41:17: dataset is busy cannot destroy snapshot disk3/backup@littleboy_2024-01-09-04:41:07: dataset is busy cannot destroy snapshot disk3/backup@littleboy_2023-12-21-04:41:06: dataset is busy cannot destroy snapshot disk3/backup@littleboy_2023-12-22-04:41:06: dataset is busy cannot destroy snapshot disk3/backup@littleboy_2024-01-05-04:41:21: dataset is busy lsof shows nothing # lsof /mnt/disk3/backup Could be SMB?
  12. ok, atm is all fine but I've started a couple of minutes ago Great advice! But I've understood a bit how parity works now, what does the rebuild in place and the new disk. What I don't have so clear is the error handling, parity doesn't take care of errors? if an errors occur the parity is in error too and can't be more used? Another things is about the disk report, if a disk is going to fail it can be sent as notification in the array report? Things seems good, btw diagnostic is attached (I did a reboot before creating a new config) littleboy-diagnostics-20240108-1924.zip
  13. Parity calculation on going, all data present! ❤️
  14. ps. is it possible to backup the actually config when I create a new config?