turnipisum

Members
  • Posts

    151
  • Joined

  • Last visited

Report Comments posted by turnipisum

  1. Well i'm hoping i have finally sorted it! Never had that long uptime since it was built.

    1248842305_Screenshot2021-10-03230715.png.b54977d3d45ddb56e61a8208e85602c9.png

     

    I found out the 128gb of Corsair LPX 16gb dimms in the server have different version numbers which relates to different chip sets! luckily i had more dimms in another machine so i have managed to sort a 128gb set with same chips and looks like it has got me sorted at long last.

     

    Link to the below quote from reddit about version numbers.

     

    Quote

    Corsair

    "Version Number"

    Corsair sticks identify the IC with a 'version number' on the label such as "ver4.31" - props to them for this as it helps even less knowledgeable users to match kits when adding more sticks retroactively. The DDR4 numbers aren't officially documented, but they follow the same pattern as DDR3.

    The numbers take the "ver X.YZ" format where
    * X is IC maker - 3 for Micron/Spectek, 4 for Samsung, 5 for Hynix, 8 for Nanya as with DDR3.
    * Y seems to be capacity per rank - 1 for 2GB, 2 for 4GB, 3 for 8GB, 4 for 16GB. Usually this translates directly to IC density (8GB/rank = 8Gbit), but ver4.14 which uses half as many double width "x16" 4Gbit chips is a special case.
    * Z is revision, usually starting from A=0 and usually counting up one letter per increment. Hynix's first revisions are lettered "M" which is numbered as X.Y9, Samsung now do this too and it will presumably be the same.

    Micron ICs seem to be numbered oddly with different "version numbers" for different JEDEC bins, and different revisions under the same "version number".

    The known and possible version numbers are as follows;

    VersionVendorICConfirmation?

    3.20Micron4Gbit Rev.APresumed

    3.21Micron4Gbit Rev.BConfirmed

    3.22Micron4Gbit Rev.E*Speculated

    3.22Micron4Gbit Rev.F*Confirmed

    3.31Micron8Gbit Rev.BConfirmed

    3.31Micron8Gbit Rev.DPresumed

    3.31Micron8Gbit Rev.EConfirmed

    3.32Micron8Gbit Rev.HConfirmed

    3.32Micron??????????wk27 '17 2x8GB 2666 16-18-18-36 1.2V

    3.32Micron??????????wk46 '19 2x8GB 3000 15-17-17-35 1.35V

    3.40Micron16Gbit Rev.B (2133 bin)Confirmed

    3.41Micron??????????wk44 '20 2x16GB 3600 18-22-22-42 1.35V

    3.43Micron??????????wk43 '20 2x16GB 3200 16-19-19-36 1.35V

    3.43Micron16Gbit Rev.E??? (or bad bin Rev.B)wk51 '20 2x16GB 3200 16-20-20-38 1.35V

    3.44Micron16Gbit Rev.B (2666 bin)Confirmed

    4.14Samsung4Gbit D-die (4x16)Confirmed

    4.23Samsung4Gbit D-dieConfirmed

    4.24Samsung4Gbit E-dieConfirmed

    4.21Samsung8Gbit B-die (4x16)Presumed

    4.31Samsung8Gbit B-dieConfirmed

    4.31Samsung8Gbit C-die**Presumed

    4.32Samsung8Gbit C-dieConfirmed

    4.33Samsung8Gbit D-diePresumed

    4.34Samsung8Gbit E-diePresumed

    4.49Samsung16Gbit M-diePresumed

    4.40Samsung16Gbit A-dieSpeculated

    5.29Hynix4Gbit MFRConfirmed

    5.20Hynix4Gbit AFRConfirmed

    5.21Hynix4Gbit BJRSpeculated

    5.22Hynix4Gbit CJRPresumed

    5.39Hynix8Gbit MFRConfirmed

    5.30Hynix8Gbit AFRConfirmed

    5.31Hynix8Gbit "BFR"???Speculated

    5.32Hynix8Gbit CJRConfirmed

    5.33Hynix8Gbit DJRPresumed

    5.38Hynix8Gbit JJRPresumed

    5.49Hynix16Gbit MJRPresumed

    8.20Nanya4Gbit Rev.ASpeculated

    8.21Nanya4Gbit Rev.B***Presumed

    8.23Nanya4Gbit Rev.D***Presumed

    8.30Nanya8Gbit Rev.APresumed

    8.31Nanya8Gbit Rev.B****Confirmed

    Especially with Micron, Corsair version numbers are sometimes weird. Confirmed means an IC has been seen under a version number, not that it can't also cover something else.

    *Rev.F is confirmed to come in ver3.22 sticks, but that doesn't leave a gap for Rev.E. It's wildly guessed that they may both appear under 3.22.
    **TechPowerUp recently got a sample kit of Vengeance RGB Pro SL 2x8GB 3600c18 under this version; however, the chips had SAC marks on them (which by Corsair's IC labeling scheme would indicate C-die) and behaved like C-die in OCing.
    ***Version number seen in the wild, IC unconfirmed.
    ****Deduced from the NAB... Corsair code on the ICs, as well as a Corsair rep statement, acc. to one post from China.

    Date code

    The first 4 digits of a Corsair serial number are a date code in the form yyww, eg 1528 is week 28 2015.

    Corsair relabeled ICs

    Some ICs loaded into Corsair sticks have been shown to assume a marking with a Corsair logo and two text lines, the first presumably stating the IC configuration, and the second featuring an internal Corsair code that seems to correspond to the IC manufacturer and stepping, as well as a yyww format date at the end. Unfortunately, such kind of marking has only been confirmed in some ver5.xx (Hynix) and 8.xx (Nanya) sticks. Samsungs (ver4.xx) may have it too but as seen in the ver4.31 example, it may collide with the version number scheme.

    VersionCodeICOriginal partial mark

    4.31SAC...Samsung 8Gbit C-dienone, determined by OCing behaviour

    5.20HYA...Hynix 4Gbit AFRDWMF...

    5.30HYA...Hynix 8Gbit AFRDTCC...

    5.32HYC...Hynix 8Gbit CJRDTBM... / none

    - (ValueSelect)NAA...Nanya 8Gbit A-die?arbitrary Nanya

    8.31NAB...Nanya 8Gbit B-die?arbitrary Nanya

     

    • Like 2
  2. AMD 3970x all spec below. The issue is the 3 VM's are in daily use i will have to build temp machines to kill them off for days.

     

    Case: Corsair Obsidian 750d | MB: Asrock Trx40 Creator | CPU: AMD Threadripper 3970X | Cooler: Noctua NH-U14S | RAM: Corsair LPX 128GB DDR4 C16 | GPU: 2 x MSI  RTX 2070 Super's | Cache: Intel 660p Series 1TB M.2 X2 in 2TB Pool | Parity: Ironwolf 6TB | Array Storage: Ironwolf 6TB + Ironwolf 4TB | Unassigned Devices: Corsair 660p M.2 1TB + Kingston 480GB SSD + Skyhawk 2TB  | NIC: Intel 82576 Chip, Dual RJ45 Ports, 1Gbit PCI | PSU: Corsair RM1000i

  3. Update to my last post.

     

    It didn't fix it!☹️ I got a random uptime of 47 days then now i'm back to 1-4 ish days then crash.

    I have just disabled PCIe ACS override see if that does anything. 

     

    Same error as i always have in logs.

     

    Mar  6 19:50:21 SKYNET-UR kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./TRX40 Creator, BIOS P1.70 05/29/2020
    Mar  6 19:50:21 SKYNET-UR kernel: RIP: 0010:__iommu_dma_unmap+0x7a/0xe8
    Mar  6 19:50:21 SKYNET-UR kernel: Code: 46 28 4c 8d 60 ff 48 8d 54 18 ff 49 21 ec 48 f7 d8 4c 29 e5 49 01 d4 49 21 c4 48 89 ee 4c 89 e2 e8 8f df ff ff 4c 39 e0 74 02 <0f> 0b 49 83 be 68 07 00 00 00 75 32 49 8b 45 08 48 8b 40 48 48 85
    Mar  6 19:50:21 SKYNET-UR kernel: RSP: 0018:ffffc9000468f9f8 EFLAGS: 00010206
    Mar  6 19:50:21 SKYNET-UR kernel: RAX: 0000000000002000 RBX: 0000000000001000 RCX: 0000000000000001
    Mar  6 19:50:21 SKYNET-UR kernel: RDX: ffff888102d55020 RSI: ffffffffffffe000 RDI: 0000000000000009
    Mar  6 19:50:21 SKYNET-UR kernel: RBP: 00000000fed7e000 R08: ffff888102d55020 R09: ffff8881525b2bf0
    Mar  6 19:50:21 SKYNET-UR kernel: R10: 0000000000000009 R11: ffff888000000000 R12: 0000000000001000
    Mar  6 19:50:21 SKYNET-UR kernel: R13: ffff888102d55010 R14: ffff88813d301000 R15: ffffffffa00da640
    Mar  6 19:50:21 SKYNET-UR kernel: FS:  0000148f90fb0740(0000) GS:ffff889fdd840000(0000) knlGS:0000000000000000
    Mar  6 19:50:21 SKYNET-UR kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Mar  6 19:50:21 SKYNET-UR kernel: CR2: 0000150fa8003340 CR3: 000000014cd8a000 CR4: 0000000000350ee0
    Mar  6 19:50:21 SKYNET-UR kernel: Call Trace:
    Mar  6 19:50:21 SKYNET-UR kernel: iommu_dma_free+0x1a/0x2b

     

  4. Is it just the VM crashing or unraid as well and do you pass through gfx card?

    I had issues related to qemu with my VM's crashing vm as well as unraid solution was change machine type to older one.

  5. Update!

    Looks like i have finally found the fix to my lock ups! It would appear to be a VM qemu issue. I changed my machine type to q35-4.2 from q35-5.1 and have not had a issue since. Now on 18 days up time.

    I had already change from i440fx to q35 but had both on 5.1 so i'm guessing that i440fx-4.2 would work fine in my case as well. I want to get 30 days up time to be sure, then i will try i440fx-4.2 see what happens.

     

  6. Update!

     

    I went to 6.9.0 rc1 updated Nvidia drivers on both vm's and got to almost 9 days up! Then updated to rc2 and within 48hrs i had 2 lock up's so it's still plaguing me! 🤷‍♂️

     

    I have just redone the 2 vm's on new templates using q35 5.1 (was on i440fx) and new virtio drivers 0.1.190 on them so we will see if that makes any difference. 

     

    But in all lock up's it seems to be iommu issue in my case.

    Dec 22 21:30:45 SKYNET-UR kernel: RIP: 0010:__iommu_dma_unmap+0x7a/0xe8
    Dec 22 21:30:45 SKYNET-UR kernel: Code: 46 28 4c 8d 60 ff 48 8d 54 18 ff 49 21 ec 48 f7 d8 4c 29 e5 49 01 d4 49 21 c4 48 89 ee 4c 89 e2 e8 8f df ff ff 4c 39 e0 74 02 <0f> 0b 49 83 be 68 07 00 00 00 75 32 49 8b 45 08 48 8b 40 48 48 85
    Dec 22 21:30:45 SKYNET-UR kernel: RSP: 0018:ffffc900018239f8 EFLAGS: 00010206
    Dec 22 21:30:45 SKYNET-UR kernel: RAX: 0000000000002000 RBX: 0000000000001000 RCX: 0000000000000001
    Dec 22 21:30:45 SKYNET-UR kernel: RDX: ffff888100066e20 RSI: ffffffffffffe000 RDI: 0000000000000009
    Dec 22 21:30:45 SKYNET-UR kernel: RBP: 00000000fed7e000 R08: ffff888100066e20 R09: ffff8881596d6bf0
    Dec 22 21:30:45 SKYNET-UR kernel: R10: 0000000000000009 R11: ffff888000000000 R12: 0000000000001000
    Dec 22 21:30:45 SKYNET-UR kernel: R13: ffff888100066e10 R14: ffff88813da76000 R15: ffffffffa00e0640
    Dec 22 21:30:45 SKYNET-UR kernel: FS:  000014ebd85ae740(0000) GS:ffff889fdd180000(0000) knlGS:0000000000000000
    Dec 22 21:30:45 SKYNET-UR kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Dec 22 21:30:45 SKYNET-UR kernel: CR2: 000014ebd8740425 CR3: 000000015c5a2000 CR4: 0000000000350ee0
    Dec 22 21:30:45 SKYNET-UR kernel: Call Trace:
    Dec 22 21:30:45 SKYNET-UR kernel: iommu_dma_free+0x1a/0x2b

     

  7. Well i updated my 2 win10 vm's with the latest nvida drivers and not had lockup since. I got to almost 5 days but now i've updated to RC1 and done a few other things, like put memory back to 2666Mhz change some cpu pinning and swapped around some usb pass through. So we will see how it goes 🤞 

  8. All seems good on the update but i did get call trace on boot. Everything seems to be running fine and didn't lock up.

     

    Dec 10 19:15:26 SKYNET-UR kernel: ------------[ cut here ]------------
    Dec 10 19:15:26 SKYNET-UR kernel: WARNING: CPU: 3 PID: 7743 at drivers/iommu/dma-iommu.c:471 __iommu_dma_unmap+0x7a/0xe8
    Dec 10 19:15:26 SKYNET-UR kernel: Modules linked in: nfsd lockd grace sunrpc md_mod nct6683 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic libchacha bonding atlantic igb i2c_algo_bit r8169 realtek mxm_wmi wmi_bmof edac_mce_amd kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd btusb glue_helper btrtl btbcm rapl btintel r8125(O) ahci bluetooth libahci ecdh_generic ecc nvme i2c_piix4 nvme_core ccp k10temp i2c_core wmi button acpi_cpufreq [last unloaded: atlantic]
    Dec 10 19:15:26 SKYNET-UR kernel: CPU: 3 PID: 7743 Comm: ethtool Tainted: G           O      5.9.13-Unraid #1
    Dec 10 19:15:26 SKYNET-UR kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./TRX40 Creator, BIOS P1.70 05/29/2020
    Dec 10 19:15:26 SKYNET-UR kernel: RIP: 0010:__iommu_dma_unmap+0x7a/0xe8
    Dec 10 19:15:26 SKYNET-UR kernel: Code: 46 28 4c 8d 60 ff 48 8d 54 18 ff 49 21 ec 48 f7 d8 4c 29 e5 49 01 d4 49 21 c4 48 89 ee 4c 89 e2 e8 90 df ff ff 4c 39 e0 74 02 <0f> 0b 49 83 be 68 07 00 00 00 75 32 49 8b 45 08 48 8b 40 48 48 85
    Dec 10 19:15:26 SKYNET-UR kernel: RSP: 0018:ffffc90001b7ba40 EFLAGS: 00010206
    Dec 10 19:15:26 SKYNET-UR kernel: RAX: 0000000000002000 RBX: 0000000000001000 RCX: 0000000000000001
    Dec 10 19:15:26 SKYNET-UR kernel: RDX: ffff889fd593e820 RSI: ffffffffffffe000 RDI: 0000000000000009
    Dec 10 19:15:26 SKYNET-UR kernel: RBP: 00000000fed6e000 R08: ffff889fd593e820 R09: ffff889f7effdb70
    Dec 10 19:15:26 SKYNET-UR kernel: R10: 0000000000000009 R11: ffff888000000000 R12: 0000000000001000
    Dec 10 19:15:26 SKYNET-UR kernel: R13: ffff889fd593e810 R14: ffff889f99db6800 R15: ffffffffa012a600
    Dec 10 19:15:26 SKYNET-UR kernel: FS:  000015297c8f9740(0000) GS:ffff889fdd0c0000(0000) knlGS:0000000000000000
    Dec 10 19:15:26 SKYNET-UR kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    Dec 10 19:15:26 SKYNET-UR kernel: CR2: 000015297ca8b425 CR3: 0000001f7d09c000 CR4: 0000000000350ee0
    Dec 10 19:15:26 SKYNET-UR kernel: Call Trace:
    Dec 10 19:15:26 SKYNET-UR kernel: iommu_dma_free+0x1a/0x2b
    Dec 10 19:15:26 SKYNET-UR kernel: aq_ptp_ring_free+0x31/0x60 [atlantic]
    Dec 10 19:15:26 SKYNET-UR kernel: aq_nic_deinit+0x4e/0xa4 [atlantic]
    Dec 10 19:15:26 SKYNET-UR kernel: aq_ndev_close+0x26/0x2d [atlantic]
    Dec 10 19:15:26 SKYNET-UR kernel: __dev_close_many+0xa1/0xb5
    Dec 10 19:15:26 SKYNET-UR kernel: dev_close_many+0x48/0xa6
    Dec 10 19:15:26 SKYNET-UR kernel: dev_close+0x42/0x64
    Dec 10 19:15:26 SKYNET-UR kernel: aq_set_ringparam+0x4c/0xc8 [atlantic]
    Dec 10 19:15:26 SKYNET-UR kernel: ethnl_set_rings+0x202/0x258
    Dec 10 19:15:26 SKYNET-UR kernel: genl_rcv_msg+0x1d9/0x251
    Dec 10 19:15:26 SKYNET-UR kernel: ? genlmsg_multicast_allns+0xea/0xea
    Dec 10 19:15:26 SKYNET-UR kernel: netlink_rcv_skb+0x7d/0xd1
    Dec 10 19:15:26 SKYNET-UR kernel: genl_rcv+0x1f/0x2c
    Dec 10 19:15:26 SKYNET-UR kernel: netlink_unicast+0x10c/0x19d
    Dec 10 19:15:26 SKYNET-UR kernel: netlink_sendmsg+0x29d/0x2d3
    Dec 10 19:15:26 SKYNET-UR kernel: sock_sendmsg_nosec+0x32/0x3c
    Dec 10 19:15:26 SKYNET-UR kernel: __sys_sendto+0xce/0x109
    Dec 10 19:15:26 SKYNET-UR kernel: ? exc_page_fault+0x351/0x37b
    Dec 10 19:15:26 SKYNET-UR kernel: __x64_sys_sendto+0x20/0x23
    Dec 10 19:15:26 SKYNET-UR kernel: do_syscall_64+0x5d/0x6a
    Dec 10 19:15:26 SKYNET-UR kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
    Dec 10 19:15:26 SKYNET-UR kernel: RIP: 0033:0x15297ca13bc6
    Dec 10 19:15:26 SKYNET-UR kernel: Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb bc 0f 1f 80 00 00 00 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 55 48 83 ec 30 44 89 4c 24 2c 4c 89
    Dec 10 19:15:26 SKYNET-UR kernel: RSP: 002b:00007fffc535d018 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
    Dec 10 19:15:26 SKYNET-UR kernel: RAX: ffffffffffffffda RBX: 00007fffc535d090 RCX: 000015297ca13bc6
    Dec 10 19:15:26 SKYNET-UR kernel: RDX: 000000000000002c RSI: 000000000046f3a0 RDI: 0000000000000004
    Dec 10 19:15:26 SKYNET-UR kernel: RBP: 000000000046f2a0 R08: 000015297cae41a0 R09: 000000000000000c
    Dec 10 19:15:26 SKYNET-UR kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 000000000046f340
    Dec 10 19:15:26 SKYNET-UR kernel: R13: 000000000046f330 R14: 0000000000000000 R15: 000000000043504b
    Dec 10 19:15:26 SKYNET-UR kernel: ---[ end trace 91c54fcae68e89eb ]---

     

  9. 45 minutes ago, almulder said:

    Yes Bio is set to IOMMU and set to enabled (Not Auto).

     

    However I had not tried "VFIO allow unsafe interrupts" (Did not even notice that option) and now it seems to be working with both being pased through. I will test more once windows gets installed and then pass through graphics, but nvme was found without issue and installing.

     

    Thanks so much for this info.

     

    Great news! hopefully you'll be all good then. 👍 

    • Like 1
  10. 4 hours ago, almulder said:

    That will have no effect on the USB that needs to be passed through. This seems to be an issue with QEMU needes to be updated within the code. Looks like there is another user that dug into this. Just seems odd other 35 beta users that do pass through have not reported the issue. I am trying to create a VM and it keeps failing, and nobody else seems to speak up. I was hoping it was an easy fix as I upgraded my server to also run a daily driver VM, but unable to set that up yet due to the errors comming up.

     

    Still looking for a solution

    The creation error in your first post is related to the iommu group 14 which is the disk your trying to pass through is it not? So usb might be working but it just doesn't get past the disk issue.

  11. On 12/1/2020 at 5:58 PM, almulder said:

    So does nobody else have issue passing devices through to VM in the beta 35? Really want to get this working so I can make the VM my daily driver and Gaming Setup. I need to pass through an NVME that is group 14, and group 19 for my usb, and then I have an RTC 2070 to pass through for graphics. I have tried to just pass through group 14 and no luck same error, tried passing through just group 19 again same error. Have not tried graphics yet as I have heard that is best to wait until after you get the VM up and working corectly.

     

    Also here is my System info.

    image.png.978b8f7b15b8258ebc986a526cbd7433.png

    You could pass it as vdisk or try and pass it via unassigned devices via dev/mnt...

    Also are you editing existing vm setup? If so have you tried new vm setup with what you want.

  12. 57 minutes ago, sittingmongoose said:

    Any idea if @limetech has looked at this?  Its been quite a while and I don't see any attempts to look at this problem.  Its affecting beta 35 as well.  Presumably this issue would follow forward into the next release as well...

    Nothing new other than than the c-states, typical current idle or memory speed tricks to try.

    Yes i'm on beta 35 now still having issues! I'm leaning to kernel issue or Nvidia drivers on the two 2070 supers on my vm's maybe. tried a lot of tweaks losing track now lol.

    I have seen posts on other forum's about bare metal ryzen and linux rig's having lock up's as well.

    I'm hoping that 6.9.0 release will solve it but who knows.

  13. Update! so looking in my bios again i had power supply set to "low current idle" so i've now set it to "typical current idle" and got almost 4 days uptime so far! 🤞

    Also i did some logging of psu usage just in case. most i've seen while gaming on 2 vm's with 2070 supers on each one was 760 watts so don't think i'm hitting limit on the hx1000i but i am going get a 1600i as soon as i can but at £460 it's gonna have to wait to new year as i will need bigger ups as well another £700-1k 🤪

  14. 5 hours ago, trurl said:

    What about the RAM recommendations at that link?

    I've got 8x 16gb strips running at 2133mhz so well within spec. I was running it at 2666mhz which is still in with suggest max. Dropping ram speed was one of the first things i did when i started getting issues as well as memory test.

    image.png.3b686d4a37596a22888260baa0a21c68.png

     

  15. 38 minutes ago, sittingmongoose said:

    I am having a similar issue.  Started with Beta 29, beta 25 was good.  Beta 25 was the Nvidia build with a quadro p2000 and using hardware accel in my dockers.  Never crashed.

     

    Now on beta 29 and higher, I get crashes every few hours.  Ive been running for like 3 days now and I have had 12 crashes already.  Completely unresponsive, can't access anything, I need to do a power cycle.  

     

    Using the Nvidia Driver on beta 35 but on beta 29 I was using the Nvidia build.  Absolutely nothing in my logs....Server is pretty much useless now, I have no idea when it dies, don't get any notifications or anything.  So only way I know its down is someone messages me saying they can't access Plex.

     

    Attaching Diagnostics and Enhanced Syslogs

    Archive.zip 354 kB · 0 downloads

    Oh crap i don't get it that many times a day revert back to a beta that worked for you. mines random can be 3-5 days a day or like 12 days longest up on beta 35 is 17 days i think.

     

    What is your server hardware?