Jump to content

Percutio

Members
  • Posts

    12
  • Joined

  • Last visited

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

Percutio's Achievements

Noob

Noob (1/14)

0

Reputation

1

Community Answers

  1. From some of the charts that I looked at, SMB Direct is available for Workstation/Enterprise/Education editions, excluding the server editions. I am using Windows 11 Education in the test but Education features have been very difficult to track over the years since when I first got Windows 10 Education, it is basically Enterprise without Cortana and a little bit extra, but Windows 11 Education has a lot more differences... EDIT: This is what I see in the Windows Features window so hopefully this does mean I can get RDMA working someday when I get more time again
  2. From what I understand, RDMA is no longer prototype/experimental, since it is a server multi channel feature (out of experimental since Samba 4.15 release) and it is somewhat detailed in the interfaces option in their documentation https://www.samba.org/samba/docs/current/man-html/smb.conf.5.html . This gives me some confidence that I won't run into bugs with using RDMA. I tried this on Unraid 6.12.2, Windows 11 Education client, and both machines using MCX314A-BCCT NICs. Both machines are directly connected since I don't have 40gbps network switches and I don't think I can get multi NIC SMB to work when both ports on each machine are connected to different networks (my home network and direct connection between the two machines). The NIC from Windows side does show it is RSS and RDMA capable from the command: PS C:\Windows\system32> Get-SmbClientNetworkInterface Interface Index RSS Capable RDMA Capable Speed IpAddresses Friendly Name --------------- ----------- ------------ ----- ----------- ------------- 16 True True 40 Gbps {10.6.13.18} mlx_direct_40g For the SMB Extras configuration, I have this: interfaces = "10.6.13.17;capability=RSS,capability=RDMA,speed=40000000000" "10.32.0.46;capability=RSS,speed=10000000000" I only set up the 40gbps direct connection to have RDMA since I don't have any other clients in my home network that can use RDMA. Maybe in the future someday since I am considering in building a test bench pc, so I'll just get another Mellanox card to pair that with it. The Unraid machine is using a 6 core i5-8600k and Windows machine is using a 12 core i7-12700k, so I am not sure why there is only 4-5 TCP connections only between the two machines... root@Alagaesia:~# netstat -tnp | grep smb tcp 234 0 10.6.13.17:445 10.6.13.18:50885 ESTABLISHED 3983/smbd tcp 0 0 10.6.13.17:445 10.6.13.18:58921 ESTABLISHED 28874/smbd tcp 0 0 10.32.0.46:445 10.32.1.32:42472 ESTABLISHED 26795/smbd tcp 117 1528 10.6.13.17:445 10.6.13.18:50883 ESTABLISHED 3983/smbd tcp 234 0 10.6.13.17:445 10.6.13.18:50577 ESTABLISHED 3983/smbd tcp 234 0 10.6.13.17:445 10.6.13.18:50884 ESTABLISHED 3983/smbd I'm assuming 4 of those connections are for RSS and one of them is for RDMA. This next command on the Windows machine is a little interesting though: PS C:\Windows\system32> Get-SmbMultichannelConnection -IncludeNotSelected Server Name Selected Client IP Server IP Client Interface Index Server Interface Index Client RSS Capable Client RDMA Capable ----------- -------- --------- --------- ---------------------- ---------------------- ------------------ ------------------- 10.6.13.17 False 10.6.13.18 10.6.13.17 16 7 False True 10.6.13.17 True 10.6.13.18 10.6.13.17 16 7 True False 10.6.13.17 False 172.20.0.46 10.32.0.46 4 10 True False 10.6.13.17 False 172.20.0.46 10.6.13.17 4 7 False True 10.6.13.17 False 172.20.0.46 10.6.13.17 4 7 True False The connections that are probably useful here are those between 10.6.13.18 and 10.6.13.17. For some reason I have two connections where RSS is enabled while RDMA is disabled, and vise versa. I don't know if this is by design or there is some other setting that I missed to have a single connection be RSS and RDMA capable. I don't have the time to do further testing with this for a while, but it seems like the RSS capable connection is selected most of the time, while I only saw the RDMA capable connection get selected once in a random test.
  3. Here is the problem I am having: I have 2 shares, Share1 and Share2. Share1 is mounted in the NFS client while Share 2 is not. I wanted symlinks to work between NFS shares (ie. a symlink in Share1 linking to folders in Share2). I do not want to mount Share2 in the NFS client, so relative symlinks to Share2 don't work (I'll go into more detail about this later down below). CIFS/SMB clients do not have this issue with symlinks, so a symlink in Share1 linking to files in Share2 does in fact work. I prefer to continue using NFS if possible since the client is linux based. To my understanding, I'll never be able to get a symlink to work unless I mount Share2 in the NFS client, so is there an alternative solution to this that doesn't use symlinks? Is there any way for Share1 to link to files in Share2 without mounting Share2 (for NFS clients). For additional context: I am mounting these shares into a Proxmox host, so the only way to mount NFS/SMB shares is via the Storage tab or command line (I do not know how mounting NFS/SMB shares via command line will break future Proxmox upgrades or how it will show up in the Storage tab so I do not want to do this through command line) Mounting NFS/SMB shares via the Storage tab will create additional folders in the share that I cannot delete, so I am alright with this in Share1 but not ok with this in Share2. Currently my workaround is to use a SMB share, since SMB is able to resolve these symlinks without any issues. I'd prefer to use NFS if there is an actual solution to this.
  4. So, that night, I had done a couple things: Blew some air on the CPU pins. The socket and pins themselves didn't look like they had any issues visually though. Retightened the CPU mounting system since SecuFirm2 prevents me from opening the retention bracket without taking it apart it seems. Maybe the issue has something to do with contact pressure. And ever since then, I have had no dirty reboots/crashes. The last few days I did have to restart or shut down the system to test a new UPS, but otherwise nothing that would trigger a ZFS scrub or similar. There haven't been any segfaults since either. I'll just close this topic now and very much hope these dirty reboots/crashes are gone now. In the coming weeks, I plan to setup some docker containers/services and overall just using the machine as a NAS. If this issue pops up again during those actions, I'll just update this thread.
  5. There were some additional kernel logs about CPU on the 4th dirty reboot so far so I could try seeing what happens when I resocket the CPU and use some air to clean the socket pins. I'll try doing that tonight and in the meantime just shut the machine down. syslog-192.168.1.12 (1).log
  6. What other hardware troubleshooting steps are there to do? At the very least I doubt the memory sticks have any problems unless I have been unlucky with 2 different memory vendors and Windows can't detect the problems.
  7. I am still getting crashes/dirty reboots with the new RAM. This RAM definitely has no problems whatsoever with my main Windows machine and when I installed it in the unraid machine, it did not show any errors when I ran TM5 tests with anta777 configs. I even had the Extreme1 config running for 24+ hours and nothing happened. This time around I caught the unraid machine rebooting/crashing twice within around 30 minutes of each other while I was looking into issues with the zfs replication. They happened around 9am Feb 7 and 9:30 Feb 7. EDIT: It crashed/rebooted yet again at around 9:48 just as I was about to bring up all the kernel logs in the tools page syslog-192.168.1.12.log
  8. I'll try swapping the RAM of my current Windows machine with the RAM in the Unraid machine tomorrow and run TM5 on the RAM. The RAM of my current Windows machine is not in the QVL list for the Unraid machine's motherboard so I am hoping this won't cause issues either... EDIT: I had the new RAM running TM5 for a couple days now and there were no errors. I even had Extreme1 config running for around 24 hours. Now the machine is running Unraid again and soon I'll be looking into doing a zfs replication to another pool so that will probably be using a lot of RAM in the progress as a real world test.
  9. I would also like to note that the lsof segfaults could have something to do with how lsof had been pinning one of the CPU cores to 100%. Ever since I uninstalled a plugin I suspected was running the lsof command, I don't think I have seen a CPU core pinned to 100% like that since.
  10. I replaced them sometime mid December. I am hoping that these RAM sticks don't have any issues because even when I do get around to RMAing the previous RAM sticks, those RAM sticks are not in the QVL list while the current RAM sticks are in the QVL list. Edit: Additional history behind all the RAM sticks: The current 2x8GB sticks is the RAM that used to run on the machine back when it was a Windows machine. I don't remember regular blue screens happening due to this RAM though. I used to also have another 2x8GB sticks of the same SKU but likely due to me moving around the RAM a lot when I was troubleshooting RAM issues I had previously, one of the sticks had a capacitor break off so I took out 2 of the paired sticks out. The previous 2x16GB sticks was part of my current Windows machine. I had numerous blue screen issues with various error codes that I could only attribute to being the RAM being faulty in some way. It didn't matter whether I turned on XMP or not, as the blue screens still occured. Running TM5 on both my current Windows machine and the Unraid machine using a Windows OS had TM5 reporting issues, but Memtest86 was never able to detect any errors at all. My current Windows machine is using 2x16GB sticks that I made sure was part of the QVL list and the Windows machine has been stable and not having blue screens since. The QVL list says it can also support up to 4 sticks of this SKU, so I am willing to consider buying more of this SKU and use 2 of them for the current Unraid machine.
  11. I had syslog server running since around Jan 2. The most recent crash happened around Jan 30 12:00:22. syslog-192.168.1.12.log
  12. Regularly getting unclean crashes/reboots. I have not been able to pin down why my Unraid system does these unclean crashes/reboots that sometimes happens within hours of booting up, or sometimes days in between. The last unclean crash/reboot was just hours ago and Unraid had been running for 2-3 days. Usually when I notice this Unraid has already done an unclean reboot and I see that scrubs/parity checks are being done. Sometimes, I notice that the machine is not responsive, so when I run to it and turn on the monitor, nothing gets displayed likely because it is crashing? Some of the things I have tried doing: I have tried this Saturday to replug all the power cables I could find and this did not fix this issue I tried replacing the memory sticks. I have known the previous memory sticks passed MemTest86 tests, but it would fail sometimes when I run TM5 using @anta777 configs and even blue screen the system at times. The current memory sticks passed Memtest86 and did not error out or blue screen when running TM5. Replacing the memory sticks did not fix this issue either. This machine didn't seem to have issues when running on Windows 10 OS. I did however move the machine to a new chassis when I converted it to be an Unraid machine. Still, I needed to run TM5 using Windows and nothing seemed to happen when I ran the memory tests then. I should note that because the unclean crashes/reboots happen regularly, but in random intervals, it is possible I could have encountered this issue in Windows and just simply did not give it enough runtime to encounter it. Edit: Also tried: Connecting the machine to a UPS in case this issue was caused by dirty power. Even when connected to a line interactive UPS, this issue occurs and I don't hear my UPS screaming at me when power gets cut from it like during blackouts. alagaesia-diagnostics-20230130-1416.zip
×
×
  • Create New...