Synd

Members

View Profile See their activity

Posts
14
Joined
December 25, 2020
Last visited
13 hours ago

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

Synd's Achievements

Noob (1/14)

Reputation

[6.12.8] No WebGUI - /run full

Synd commented on Amane's report in Stable Releases

Sorry can't share the diagnostic @JorgeB (there's Engineering Sample equipment on my unraid, i can't do that due to NDAs) model name : AMD EPYC 7B13 64-Core Processor root@Megumin:~# grep -c ^processor /proc/cpuinfo 128 The big difference from my look at both of them is the multiple GPUs vs my 1 (which is a converged accelerator, which is the ES part) root@Megumin:~# du -sh /run/udev/* | sort -h | tail -3 0 /run/udev/tags 0 /run/udev/watch 17M /run/udev/data (this is my udev size) I know there was changes on the kernel on udev for multiple gpus due to issues at booting them "fast" enough on AI clusters. (just don't remember which 6.1 version got it, but it's one between 6.12.6 and 6.12.8, still looking at patches to see where it was applied, as we upgraded to 6.6 at work to have the patch before it was on 6.1) So, one thing i should think is making the udev 64MB for 6.13 (which would help for weird and complex systems.) I know Rome vs Milan also do udev differently, as it was an updated architecture on the cpu side. Mobo: Asrock Rack RomeD8-2T CPU: Epyc 7B13 Ram: 1TB PCIe Expensions: 1 x LSI 9400-8i (x8) 3 x ASUS Hyper M.2 card filled with m.2 1 x Nvidia A100X ES 1 x Connectx-5 pro 2 x Oculink to U.2 2 x M.2 (On-board) Those are all my specs as i posted yesterday on Discord to confirm where the issue could be.
- February 19
- 60 comments
- - 3
[6.11.5] Slow parity

Synd commented on Synd's report in Stable Releases

Done, and found the regression in the kernel itself going thru our gits at work + the ticketering system. Dummy waits are applied to AMD with kernel 5.19 and was patched for 6.0. I shared a lot more details to staff in a private discord channel due to bringing internal work data into the conversation.
- December 22, 2022
- 14 comments
- - 2
[6.11.5] Slow parity

Synd commented on Synd's report in Stable Releases

Which is what we found on Discord that Intel is not affected too. @Fuggin and @Kilrah did a parity, but only AMD builds are slowing down. @AgentXXL @Pri and me are running Epyc. I know @The_Mountain also has the issue on Epyc. This is a specific AMD bug after testing with multiple setups in the #offtopic channel of the Discord.
- December 22, 2022
- 14 comments
[6.11.5] Slow parity

Synd commented on Synd's report in Stable Releases

To add onto this, after more discussions, we found it's all the people with Epyc systems that have a slower parity since 6.11 like this bug presents. The Xeons users are usually fine with the Discord mods all running tests with some of the most active Discord users.
- December 21, 2022
- 14 comments
[6.11.5] Slow parity

Synd posted a report in Stable Releases

Hi, Since 6.11.5 upgrades, each time I run a parity check, write to the array or use the mover, the parities are running at 60-70MB/s, while before the upgrade, it was at 170-180MB/s. The system is all connected on SAS3 equipment, but we've seen the same issues on Discord with others running SAS2, or Direct connect. I included my diagnostics, and others will do as well, as we talked on Discord to give as much data points about this. I don't spindown or anything. I run write reconstruct all the time, but it's almost like the setting doesn't work at all anymore. This is a bug that impacts AMD only. There was a regression in kernel 5.19 about dummy waits that caused to impact AMD CPUs. (https://www.phoronix.com/news/Linux-6.0-AMD-Chipset-WA) Thanks. megumin-diagnostics-20221221-1116.zip
- December 21, 2022
- 14 comments
mlx4_core driver and renaming one nic either to eth1 or eth3 on reboot - 6.10rc3 (updated with rc4 update)

Synd commented on Synd's report in Prereleases

After update to RC4, the eth1 nic completely dissapeared, and only eth3 is left. So reboots have both nics working together. Reduced Priority to annoyance and added a new diagnostics to help compare with the rc3o, rc3 and rc4, if there's something to be seen that could cause it for the future. Thanks. megumin-diagnostics-20220319-2132.zip
- March 20, 2022
- 2 comments
mlx4_core driver and renaming one nic either to eth1 or eth3 on reboot - 6.10rc3 (updated with rc4 update)

Synd posted a report in Prereleases

Hi, I was doing tests with the rc3n/o versions and it did this as well. My Nic is getting renamed from either eth1 to eth3 or the opposite each time I reboot the system. I've changed the assignation from eth4/5 to eth0/1 during the RC2 cycle, and once I went on Test versions (n or o) or RC3, it started doing this issue on reboot. I also see the NICs in my network_rules.cfg in double on some reboots. I uploaded diagnostics of 2 different reboots for this to show the issues i'm hitting as well. My config is running the Mellanox Nics in LACP as it's how my switches only allow bonding. Thanks, Synd. megumin-diagnostics-20220310-1650.zip megumin-diagnostics-20220310-1321.zip
- March 11, 2022
- 2 comments
[v6.8.x] Parity check slower than before in certain cases

Synd commented on JorgeB's report in Stable Releases
- December 25, 2020
- 61 comments