ldrax

Members
  • Posts

    94
  • Joined

  • Last visited

Converted

  • Gender
    Undisclosed

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

ldrax's Achievements

Apprentice

Apprentice (3/14)

0

Reputation

  1. Just want to express my newfound happiness. My motherboard has only 1 USB controller (to my disappointment), but the USB-C port of 2080Ti comes to the rescue! Using 6.11.5, I just need to VFIO bind (on System Devices), restart, and then pass through the NVidia USB controller. It works! I attached a Dell WD19TB dock that's been laying around forever. It's thunderbolt connection but it works with USB-C. Also attached another USB Hub to the WD19TB. This gives a dozen of USB ports for plug-n-play purpose to the Windows 11 guest OS. Transfer rate is fast, but of course to be shared with all the plugged-in devices. I guess there will be overall bus power limit as well, but I plugged in 3 external SSDs together with a bunch of thumb drives and an SD Card, just to test. This made my day!
  2. I probably should do the Preclear again, because for reason that escaped me, I opted to 'Skip pre-read' and 'Skip post-read' when I started the preclear that night.
  3. This is the 'Disk Log Information' from unassigned devices menu. Note that this log only after the Reboot described at point 7) above. The errors at the Nov 21 23:26 - 23:27 was just after the Preclear started (point 9 above). sdu-log.txt
  4. I need to find a way to get a cleaner diagnostics. Currently a bunch of DIY scripts are contributing to syslog messages that are not exactly public. I can post the 'Disk Log information' if that helps.
  5. Hi all, I got a situation as below. Basically I want to know if I should be confident enough to use the disk in question going forward, at its current exact location and SATA slot? My plan was to shrink the array, unassign disk10, and use it as unassigned device. 1. During non-correct parity check, disk10, /dev/sdu, reported rapid UDMA CRC errors (going from 0 to 3000+ count). 2. Turned off system, replaced SATA cable (actually it's a 4-ways branch-out cable from a jbod pcie controller, the order is preserved, i.e. the "3rd cable"). 3. Parity check finished without error, and without new UDMA CRC error. 4. Disk10 content emptied without error, empty shared folder deleted. 5. To follow the procedure to 'Shrink array', https://wiki.unraid.net/Shrink_array#The_.22Clear_Drive_Then_Remove_Drive.22_Method created clear-me directory, then started the clear_array_drive script. (I've done this procedure multiple times in the past when shrinking my array). 6. Syslog reported rapidly a bunch of write error on disk10, and unraid subsequently disabled disk10. 7. Stopped the clear_array_drive script, rebooted the system, did New Config, unassigned disk10. 8. Rebuild parity with remaining 9 disks, no error. 9. Preclear sdu (formerly disk10), it started with a few lines of write errors logged on syslog, but no further error for the next 14+hours until preclear finished with successful message. 10. Did short smart test, and then extended smart test on sdu, both passed. 11. On Unassigned Devices, formatted sdu with XFS, did few GBs copy test, no error reported. I intend to use this disk on unassigned devices as 'staging disk' for big-size content work, should I be worried? Thank you beforehand.
  6. Done reposted there. I will delete this post shortly. Thanks @johnnie.black!
  7. Recently I noticed many occurrences of many processes were 'unable to fork, resources not available' errors, and these processes failed to execute. Suspecting out of memory issue (I have 32GB), I watch the top command for a while when these errors occurring, but it doesn't look that way. 'ps' command surprisingly shows there are 30k+ processes of nv_queue. -- truncated -- 32757 ? S 0:00 [nv_queue] 32758 ? S 0:00 [nv_queue] 32759 ? S 0:00 [nv_queue] 32760 ? S 0:00 [nv_queue] 32761 ? S 0:00 [nv_queue] 32762 ? S 0:00 [nv_queue] 32763 ? S 0:00 [nv_queue] 32764 ? S 0:00 [nv_queue] 32765 ? S 0:00 [nv_queue] 32766 ? S 0:00 [nv_queue] 32767 ? S 0:00 [nv_queue] # ps ax | grep nv_queue | wc -l 31198 This might come from nvidia driver. I have been using the nvidia build for a long time but the only different thing I do recently is the command nvidia-smi -pm 1 to enable the persistence mode. Without this command, my graphics card (1080Ti) would always at 55W power when idle. With this command, it's down to 9-12W. Does anyone encounter this issue? I believe the very large numbers of nv_queue processes here has rendered the system unstable, depriving many other important processes of resources. I have rebooted since, and NOT executing the -pm 1 command above. Many hours later now, no such issue. So more or less it's the command that is responsible for it. But of course now, the 55W idle power usage is back an issue.
  8. Recently I noticed many occurrences of 'unable to fork' errors. I watch the top command for a while, suspecting out of memory issue, but it doesn't look that way. 'ps' command surprisingly shows there are 30k+ processes of nv_queue. -- truncated -- 32757 ? S 0:00 [nv_queue] 32758 ? S 0:00 [nv_queue] 32759 ? S 0:00 [nv_queue] 32760 ? S 0:00 [nv_queue] 32761 ? S 0:00 [nv_queue] 32762 ? S 0:00 [nv_queue] 32763 ? S 0:00 [nv_queue] 32764 ? S 0:00 [nv_queue] 32765 ? S 0:00 [nv_queue] 32766 ? S 0:00 [nv_queue] 32767 ? S 0:00 [nv_queue] # ps ax | grep nv_queue | wc -l 31198 This might come from nvidia driver. I have been using the nvidia build for a long time but the only different thing I do recently is the command nvidia-smi -pm 1 to enable the persistence mode. Without this command, my graphics card (1080Ti) would always at 55W power when idle. With this command, it's down to 9-12W. Does anyone encounter this issue? I believe the very large numbers of nv_queue processes here has rendered the system unstable, casuing many other important processes fail to execute due to out of resources.
  9. I see. So possible cause is because it's underloaded. Thanks @Benson
  10. I noticed a strange behaviour of my PSU (Silverstone Stridec 1200W Platinum ST1200). The PSU is installed with its fan facing downwards, taking air from the bottom of the case (Corsair 760T has the intake honeycomb with filter at this bottom position). For a long time I always had the HDDs spindown delay to 'Never', i.e. disabled it. Recently I enabled it. When those disks stay spun down, the PSU would periodically ramp up its fan speed for about 10-20 seconds before it goes quiet again. When I spun the HDDs up again, the PSU is quiet all the way again. This is rather strange, as spun-down HDDs would lower the overall case temperature. The only explanation I can think of is that the PSU 'compares' its internal temperature with interior case temperature, and if it thinks it's too hot relative to the case temperature, it will ramp up its fan's speed. Does anyone notice similar behaviour? Any idea what can I do?
  11. Done, looks like all errors are corrected. Thanks! scrub status for 3c12a05c-3bba-493e-98e5-d2d3a2c7e107 scrub started at Mon Mar 30 16:21:50 2020 and finished after 00:26:03 total bytes scrubbed: 1.87TiB with 300965 errors error details: csum=300965 corrected errors: 300965, uncorrectable errors: 0, unverified errors: 0
  12. BTRFS Scrub command (non repairing, yet), however, shows a lot of errors found: scrub status for 3c12a05c-3bba-493e-98e5-d2d3a2c7e107 scrub started at Mon Mar 30 15:52:07 2020, running for 00:01:21 total bytes scrubbed: 121.53GiB with 14191 errors error details: csum=14191 corrected errors: 0, uncorrectable errors: 0, unverified errors: 0 (in progress)
  13. So before I did the --clear-space-cache command, I started the array in normal mode to be able to backup some selective files from the cache pool. While doing this, there were a lot of errors message (including message to correct them) on syslog, and rebuilding space cache message as well: Mar 30 15:18:56 gpt760t kernel: BTRFS warning (device sdh1): failed to load free space cache for block group 4907223482368, rebuilding it now Mar 30 15:18:56 gpt760t kernel: BTRFS warning (device sdh1): failed to load free space cache for block group 4158791876608, rebuilding it now Mar 30 15:18:56 gpt760t kernel: BTRFS warning (device sdh1): failed to load free space cache for block group 4512052936704, rebuilding it now Mar 30 15:18:56 gpt760t kernel: BTRFS error (device sdh1): csum mismatch on free space cache Mar 30 15:18:56 gpt760t kernel: BTRFS warning (device sdh1): failed to load free space cache for block group 4999565279232, rebuilding it now Mar 30 15:19:19 gpt760t kernel: io_ctl_check_generation: 21 callbacks suppressed Mar 30 15:19:19 gpt760t kernel: BTRFS error (device sdh1): space cache generation (126117) does not match inode (126155) Mar 30 15:19:19 gpt760t kernel: BTRFS warning (device sdh1): failed to load free space cache for block group 4959836831744, rebuilding it now Mar 30 15:19:19 gpt760t kernel: BTRFS error (device sdh1): space cache generation (126115) does not match inode (126182) Mar 30 15:19:19 gpt760t kernel: BTRFS warning (device sdh1): failed to load free space cache for block group 4998491537408, rebuilding it now --- truncated, hundreds of these same messages ---- Once the backup is completed, I restarted the array in maintenance mode, and did a check --readonly, just to check. All previous errors are now gone: [1/7] checking root items [2/7] checking extents [3/7] checking free space cache [4/7] checking fs roots [5/7] checking only csums items (without verifying data) [6/7] checking root refs [7/7] checking quota groups skipped (not enabled on this FS) Opening filesystem to check... Checking filesystem on /dev/sdh1 UUID: 3c12a05c-3bba-493e-98e5-d2d3a2c7e107 found 1030725935104 bytes used, no error found total csum bytes: 603511636 total tree bytes: 1761673216 total fs tree bytes: 533708800 total extent tree bytes: 218890240 btree space waste bytes: 481730316 file data blocks allocated: 67454937907200 referenced 1007469522944 I guess I don't have to run the btrfs-check --clear-space-cache then? Thanks @johnnie.black!