Spitko

Members
  • Posts

    12
  • Joined

  • Last visited

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

Spitko's Achievements

Newbie

Newbie (1/14)

0

Reputation

  1. This usually happens while compiling code. The behavior is as follows: - First, the compile will hang at a certain spot. The system is still responsive, but CL processes just spin forever - In task manager, disk usage for the C drive is now stuck at 100%, though actual IO is generally fairly low at this point. - Around this time, Windows will start complaining in the event log: "Reset to device, \Device\RaidPort2, was issued.". This happens frequently. - Eventually, Visual Studio itself hangs, and the system continues to become less and less responsive until it requires manual restart. You can't kill the stuck CL processes, so something's likely hung deep in the driver. The VM has three disks: <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='writeback' discard='unmap'/> <source file='/mnt/user/vms/Windows 10/vdisk1.img' index='2'/> <backingStore/> <target dev='hdc' bus='scsi'/> <boot order='1'/> <alias name='scsi0-0-0-2'/> <address type='drive' controller='0' bus='0' target='0' unit='2'/> </disk> <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='writeback' discard='unmap'/> <source dev='/dev/disk/by-id/ata-Samsung_SSD_860_EVO_1TB_S3Z8NB0M305963H'/> <target dev='hdd' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='3'/> </disk> <hostdev mode='subsystem' type='pci' managed='yes'> <driver name='vfio'/> <source> <address domain='0x0000' bus='0x41' slot='0x00' function='0x0'/> </source> <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/> </hostdev> The compile is happening on the NVME drive that's passed through at the bottom, but the error points to one of the above drives. I would suspect the first entry (the OS is installed on this one) given the error and cause, as the middle drive is entirely idle. Ideas? For now I've copied the image to a raw NVMe device which appears to work around the problem, but this is obviously less than ideal from a scaling perspective. As a starting point, I ran memtest overnight and it came back clear. Hardware: AMD Threadripper 1950x Asus ROG Zenith Extreme LSI Logic SAS 9207-8i Nothing in the unraid logs (VM or system) correspond to the event.
  2. Thirding this, my sensors suddenly all vanished (Beta 25 if that helps). Manually editing sensors.conf and removing an extra dash from the jc42 sensor fixed this for me after my sensors all dropped off the face of the earth suddenly. Upon further sleuthing, this doesn't ACTUALLY fix the issue though, the sensors command whines and the labels don't work properly if you have overlaps (ie, "temp1" isn't properly pinned to jc42). It seems the "correct" fix is to add a bus statement before the chip, eg chip "k10temp-pci-00c3" label "temp2" "CPU Temp" bus "i2c-0" "SMBus adapter" chip "jc42-i2c-0-19" label "temp1" "MB Temp" fixes the error, though I'm still having trouble getting the label command to work properly, time to hit the man pages I guess. HOWEVER, it's worth noting that jc42 are smbus memory sensors, so this has been mostly a goose chase; mobo temp can't yet be read as we're still missing a driver in unraid for the ITE IT8665E That said, selecting jc42 forever breaks the plugin since it won't remove the line from sensors.conf, and one selected it writes a bad line that breaks the sensors command. The current fix is to remove the line manually from sensors.conf, the plugin should either handle smbus sensors properly or, as a hotfix, just blacklist jc42 and handle the failure mode better.
  3. Two bugs with auto fan it have cropped up since updating to the latest version and unraid 6.9 beta: 1) It does not respect min PWM, and will shut fans down when below the temp range. This is unexpected behavior and should be either toggleable, or at least clearly worded as many servers use the same fans for general system airflow. 2) It seems to incorrectly detect highest disk temp. As an example, logs: Jul 2 20:40:56 jibril autofan: Highest disk temp is 36C, adjusting fan speed from: OFF (0% @ 0rpm) to: 136 (53% @ 0rpm) Jul 2 20:42:03 jibril autofan: Highest disk temp is 35C, adjusting fan speed from: 136 (53% @ 4021rpm) to: OFF (0% @ 3448rpm) Jul 2 20:45:13 jibril autofan: Highest disk temp is 36C, adjusting fan speed from: OFF (0% @ 0rpm) to: 136 (53% @ 0rpm) Jul 2 20:46:20 jibril autofan: Highest disk temp is 35C, adjusting fan speed from: 136 (53% @ 4000rpm) to: OFF (0% @ 3579rpm) Jul 2 20:49:30 jibril autofan: Highest disk temp is 36C, adjusting fan speed from: OFF (0% @ 0rpm) to: 136 (53% @ 0rpm) Jul 2 20:50:37 jibril autofan: Highest disk temp is 35C, adjusting fan speed from: 136 (53% @ 4043rpm) to: OFF (0% @ 3448rpm) Jul 2 20:52:45 jibril autofan: Highest disk temp is 36C, adjusting fan speed from: OFF (0% @ 0rpm) to: 136 (53% @ 0rpm) Jul 2 20:53:52 jibril autofan: Highest disk temp is 35C, adjusting fan speed from: 136 (53% @ 4043rpm) to: OFF (0% @ Meanwhile I'm getting high temp alarms on two drives, coldest spinning is 44C. Of possibly interesting note: Disk 1 was spun down at the time so unraid didn't show its temp, but I noticed later the logs do seem to roughly match that drive. It's also the only non-SAS drive in my array.
  4. Your issue sounds unrelated to mine... you should probably open a new thread.
  5. I've had to turn caching off on all shares as anything writing to cache for more than a few moments brings the whole server to a crawl. Writing lots of smaller new files does seem to be more stable than large file writes though; not quite sure if that's a useful datapoint yet. Also if it helps, the SSDs are both ADATA SU635 (ASU635SS-240GQ-R). I knew going in that QLC drives were fairly flawed, but I don't think anything I'm doing here should be hitting the limitations of the tech. The drives are rated for 520/450MB/s R/W. While some people do report lower speeds, they're still an order of magnitude above what I'm getting here.
  6. I've seen a few threads on slow cache, but the performance here isn't "Oh that could be better", it's typically worse than just writing to straight to disk. As a test, I ran `dd if=/dev/zero of=file.test bs=1024k count=8k` and, well, a picture is worth a thousand words: 853789+0 records in 853789+0 records out 6994239488 bytes (7.0 GB, 6.5 GiB) copied, 298.39 s, 23.4 MB/s btrfs filesystem df: Data, RAID1: total=84.00GiB, used=81.58GiB System, single: total=4.00MiB, used=16.00KiB Metadata, single: total=1.01GiB, used=156.73MiB GlobalReserve, single: total=84.41MiB, used=0.00B No balance found on '/mnt/cache' I have the dynamix trim plugin installed. I also tried manually trimming /mnt/cache right before running the test just to make sure it didn't error and was really running. Pool setup: Unraid 6.7.2, no useful log output.
  7. I'm a programmer, so I'm all too familiar with unix time stamps. The files were created with this docker image, by mounting a remote SMB share and copying the files from the old server to the local one. It looks like creating files locally doesn't reproduce this bug; it's likely specifc to SMB->local transfers.
  8. Update! I think I found the issue; it might be a mixture of a Samba/SMB bug and possibly an Unraid bug, or alternatively a bug in Krusader (as shipped by binhex) I did a bit more digging and statted two (different) files, one that worked and one that didn't. File: Bad.mp4 Size: 120134495 Blocks: 234640 IO Block: 4096 regular file Device: 21h/33d Inode: 4157 Links: 1 Access: (0666/-rw-rw-rw-) Uid: ( 99/ nobody) Gid: ( 100/ users) Access: 1969-12-31 15:59:59.000000000 -0800 Modify: 2019-08-16 15:28:00.169449270 -0700 Change: 2019-08-27 19:07:52.107529449 -0700 Birth: - File: Good.mp4 Size: 182610839 Blocks: 356664 IO Block: 4096 regular file Device: 21h/33d Inode: 3967 Links: 1 Access: (0666/-rw-rw-rw-) Uid: ( 99/ nobody) Gid: ( 100/ users) Access: 2019-08-27 19:21:43.859255029 -0700 Modify: 2019-08-27 19:52:42.613932984 -0700 Change: 2019-08-27 19:52:42.613175754 -0700 Birth: - The only real difference I can see here is that the bad file has an invalid/missing access time. So, as a test, I did touch Bad.mp4 and suddenly it works fine. As a note, opening the file in media player doesn't seem to update the access time; I assume this is a (reasonable) optimization, meaning the only way to unstick the bad files it to write a script to touch them all. Which might be a fine workaround, but before I do that, does anyone want to dig deeper here, or have a slightly less brute-force solution?
  9. Nope, permissions are identical between the files in question. Names as well; I even tried renaming the old folder, making a new one with the same name as the old, and everything worked fine in that one (ie, my test program could now create, read, and write files to the new folder). If I rename the old folder back the problem recurs. I checked permissions ls -n as well, and ensured there weren't just two groups named "users" or something; the permissions are absolutely identical unless there's some additional bit/flag I'm not aware of that doesn't show in ls -ln It's honestly the weirdest dang thing. Also, to be clear: the folder exists in /mnt/user, which is where I copied the files in docker and where I'm checking permissions from. I assumed copying straight to the diskN paths would be a bad idea (also because I copied more than a drive's worth of data). Also as a reminder, the weirdest (by far) part is that I can manage the files from Explorer just fine. I can open them in Mediaplayer Classic without issue, I can rename and delete them, etc. But VLC will reliably give an 'VLC is unable to open the MRL' error. BUT, if I take the exact same file, copy it to a different folder on the same share with Explorer, it works fine! This is all from the same windows machine; and at no point am I getting UACed or asked to do anything additional. New file permissions: -rw-rw-rw- 1 nobody users 605445386 Aug 7 23:41 Test.mp4 Old file permissions: -rw-rw-rw- 1 nobody users 605445386 Aug 7 23:41 Test.mp4 I can also take this new file, rename it, and copy it back to the old folder and it still plays fine. And to throw out it being a VLC-specific quirk, I get similar results in GIMP; so it does seem to be oriented around programs likely using cross-platform toolkits like GTK. Also worth noting that VLC can play the same file from the old NAS (Synology) just fine as well, using the same mechanisms (Mapped network drive over SMB) Edit: I also found another thread from 2018 with the same problem (By searching the exact VLC error, a cryptic "filesystem error: read error: No error"), no solution though.
  10. Further testing on this issue: 1) Found the "Docker safe new perms" tool via a plugin, however running this didn't yield any different results 2) Tried making a new directory and pointing my script at that; it was able to write files fine. Files written this way can be read just fine. 3) Permissions between the "bad" files and good ones appear identical, including user and group 4) Also tried logging in with a user account. This creates file under the correct user (using the "users" group) but those files can still be read as nobody just fine. I'm very confused now. I'd also like to get this sorted out before my trial runs out if possible (1 day left), so if anyone has any ideas on what to check or try please let me know.
  11. I couldn't find a "docker safe" anything under tools, but there was a "New Permissions" tool which looks like the right thing perhaps? unraidip/Tools/NewPerms Ran it on all disks against one of the shares I've been testing with. No change in behavior. Edit: Also, to follow up on permissions, here's one of the affected files: -rw-rw-rw- 1 nobody users 976202725 Aug 17 00:50 test.mp4
  12. Ok this is the weirdest thing I've seen, but it's the last quirk preventing me from retiring my synology box, so here we go. I migrated all the data over via a docker image to the new shares. This seemed to work fine, and I can view the files in explorer, add/remove/edit/open them just fine. HOWEVER, certain programs are unable to read files. They can traverse directories, but will either be unable to see any files, or give obtuse errors when trying to display or open them. I've confirmed this behavior with both Gimp and VLC under windows. For Gimp, the file open dialogue errors out when files are present in the path, and for VLC it fails to open the file (but the file open widget works just fine) The shares themselves are pretty straightforward. They're default configured public shares, and on the windows machines I've tried both mapping them to network drives, or going via UNC paths. I've also confirmed this behavior on two machines, with completely different configurations. I was able to repro this in some software I wrote, and the behavior is similar; I can check if a directory exists and it will return expected results. But checking if a file exists will return false regardless, and attempting to stat or open a file will produce "not found" errors. Running as admin doesn't appear to affect the behavior. However, the synology box does not exhibit these problems. Any ideas on what might be causing this? EDIT: Partial solution/workaround here, may require further investigation to prevent this from happening to others