hansolo77

Members
  • Posts

    178
  • Joined

  • Last visited

1 Follower

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

hansolo77's Achievements

Apprentice

Apprentice (3/14)

10

Reputation

3

Community Answers

  1. Well... i hate to jinx things, that's typically how things are for me... but... I let the memtest run all night and all day today till I got home from work. Successfully completed 4 passes with 0 errors, and was running for like 23.5 hours. I spent the afternoon putting the new motherboard back into the case. Tested memory again with the mobo all screwed in. Got to about 8% (past the dreaded 4% and onto test #3) and shut down. Then I transferred all the M.2 drives, connected all the expansion cards and fan headers, and put the lid on it. Booted up and let it run the memtest again for 1 complete pass. Then I booted into Unraid. So far, everything is going good. I updated all the plugins and dockers. I then shut down the docker and VM manager and started a parity check. Usually I run the check on the first of the month anyway but I missed it. Plus, it auto-started a parity check as soon as Unraid was loaded (remember, it was crashing whenever I started the preclear.. so I think the boot up parity check was just clearing the dirty bit). Once the parity check finishes, I'm going to reboot (so it's a clean shutdown and clean startup) and then (( FINGERS CROSSED )) try to run another preclear.
  2. Yeah his videos are really helpful. Glad it worked for you and you're up and running.
  3. I bought new RAM from Amazon on Thursday. Was supposed to be 2-day, delivery on Saturday. Never got it, said it was delayed, then delayed again then delayed AGAIN. I just now got it. It's running a memtest now but so far the entire set looks ok. It made it past the 4% mark at least. Been running about an hour and is up to 35% pass. Hope it's good. Gonna let it run all night and through tomorrow until I get home. Fingers crossed.
  4. Heh. I know right? I'm baffled. I told my brother, if different RAM has the same problem.. I'm done. It's not worth continuing with this hardware design. My next build won't be "gamer" centered. I had envisioned one day building a VM that I could stream games from it onto a smaller SBC in a different room. As it is, I've not really been able to have a stable system running for more than a year, just hosting a PLEX server. The next time I build a server, I'm going back to server-grade parts. SuperMicro/ASRockRack board, Xeon or similar CPU, ECC RAM. It's just a lot to buy from scratch. But it's definitely something I should probably start researching if new RAM doesn't fix it. I can't afford to buy a replacement CPU.
  5. Well... good news / bad news time. Good News: Replacement motherboard arrived. Bad News: Might not have been the motherboard. Details: The motherboard arrived late last night so I couldn't play around with it until after I got off work today. I spent a couple of hours being really gentle with it. I kept it on the box on top of it's anti-static bag. I removed the CPU from the old motherboard and diligently cleaned off the old thermal paste. I also removed the paste from the heatsink. I ordered some new goop that arrived a week ago, and applied that. Hooked up the power, connected a video card AND... nothing. At least not at first. Prior to the swap (I still haven't screwed it in yet...) I did another test of each ram stick one by one and identified which ones were good and bad. The first thing I did was put just 1 stick in the designated SINGLE slot, and tried to boot. The motherboard has an LED indicator for various boot stages, and indicated immediately a problem with the DRAM. So I popped out the stick and put in the other one I designated as GOOD. THAT allowed me to boot. I went into the BIOS and updated it first thing (I remembered having a problem with the original BIOS not working well with the CPU when I first built my rig so I wanted to make sure I updated it. Once the update was done I went back into the BIOS and tweaked some settings, like making sure the fan speeds were all on FULL. This is a bare build still at this point. Nothing is connected except a usb keyboard/mouse dongle, the video card, cpu and ram. I plugged in the Unraid USB and went straight into Memtest. Within 10 seconds, it reached like 4% pass then FAILED. F^CK. I rebooted with one of the "bad" sticks, again LED indicated DRAM issue. So I tried the LAST stick, labeled BAD but surprisingly it's up to 37% PASS at this point. All the others are failing at around that 4% mark. I don't know what happened. Something took a bite out of my RAM though. Unless this kind of behavior is typical with a bad CPU? There does appear to be some weird oily/greasy like residue around the solder pins under the old motherboard where the RAM is. I checked and it's NOT on the new motherboard. Maybe I had a leak somewhere that jacked up the slots and then with it running 24/7 it damaged the sticks over time. In any case, I'm going to look and see if I can return/exchange the newer RAM I have. It's probably too late (3+ years) for the old RAM to get replaced. If I can't do an exchange with Amazon on the new RAM, I'm going to have to wait for Income Tax Refund to buy new. I'm tempted to get something else. I mean, I'm really good with computers and building; I doubt it had been running good for over a year then stopped working because of a compatibility thing. I'm not even overclocking these guys.. this is the base values (not even using the XMP setting). But, these are my first G.Skill sticks. Perhaps a replacement set should be a different brand? What are your thoughts? I know my friend @Idolwild built his server with essentially the same components and hasn't had any trouble. He has the same motherboard, CPU, and RAM. So should I tempt fate and buy another matching set of RAM that I've been using? Should I go with something else like Kingston or Corsair? I'm lost. Also.. is there a CPU tester like MEMTEST? I'd like to try and verify it's still ok, and the problem is actually with the RAM. Thanks!
  6. Yeah I know.. I was generalizing. I have a RAID controller but it's not using any RAID settings, just straight JBOD. Even the BIOS detects and lists all the drives available for choosing for boot, although I'm not using it. As for the VMs, I think all I'm passing though is network, but even that might be something the VM manager does. Without having access to Unraid until my replacement motherboard arrives, I can't know for sure. If I can get Unraid to load with the new motherboard, I think I'll be good. It'll be upsetting if I have to rebuild my server data, but at least I know HOW. I was just wondering if it would all of sudden go to boot Unraid then halt at some issue due to hardware change. At first run, the only change would be the motherboard. The CPU and RAM and all the other stuff should be all the same.
  7. Not expecting delivery for awhile (January 21st+). Curious.. with a new motherboard, even if it's the same model, would it interfere with Unraid? Would it be a simple drop-in plug and play, or is it going to require rebuilding Unraid from scratch again? I know the system runs off the flash drive, and it's thumbprinted... just curious if there are any complications with swapping out the motherboard. Also curious, though I don't see how, would it complicate things if I had to replace the CPU too? I'm not sure how it all works.
  8. I’ve been testing single sticks in each slot. So far I’ve got one that passes in slot 2 but in no others, and I’m on my last stick. I’ve gone ahead and ordered a replacement motherboard. I agree it’s very unlikely that all my RAM is bad and only 1 works in 1 slot. I’ll try this method first and if the ram still fails with the new board I guess I’ll try buying another set.
  9. I'm not sure I can use a different slot. The manual says to use a specific slot for single sticks. I will try it though.
  10. Yeah I'm not sure what my problem is now. The first ram stick (one of the newer ones) completed 1 pass cycle with no fails. Then I swapped in the 2nd new stick and it failed almost immediately again (around 4-5% complete). I then swapped in one of the old sticks, it made it up to like 20% then I stopped watching it so it could it's thing. The first stick took about an hour to complete. After about 45 minutes I went back to check on the 3rd stick, it was about 86% complete but had a whole bunch of fails. I stopped the check and swapped in the last stick. It, too, failed in about 4-5% completion. So.. only 1 good stick? I swapped the first one back in again and re-tested it. This time, it failed at like 5% too. Very flaky. 1 stick out of 4 passed but then failed it's 2nd time. Could this indicate then that the RAM slot might be bad...aka the motherboard needs replaced? I feel like the whole BIOS and CPU issue is resolved for the most part... I've been able to get solid reboots into MEMTEST fine ever since I reseated it. I never noticed before but maybe it came loose or moved during my move into the apartment. I just wish I knew what to save up and buy.
  11. I've been playing around a bit, hoping and praying I don't have to buy a new setup. At first I wanted to try single sticks, but couldn't get the system to boot up. The DEBUG LED indicated a problem with CPU at first. So I pulled it off and checked it out. Everything looked great. Re-attached it and tried again. This time I was getting a number indicating "22", which isn't in manual. So I googled it and lots of people are saying "22" is related to RAM. Ok, so the CPU problem went away, back to RAM. I put all the sticks back in, and tried again. I could get it to boot to BIOS, but no further. After making some changes (down clocking the speed of the RAM) I saved and rebooted but it never fully rebooted...just sat there. I've speen the last hour or so trying to get the settings to save. All I've been able to do is get it to boot into BIOS after doign a CMOS reset. I might have figured something out though. By having just 1 stick in (and in the slot the manual says to use for just 1.. another problem I was having ^_^) I was able to get a different DEBUG LED code, that this time indicated a problem with the PCI. I had another "AH HA!" moment. Prior to moving, or around that time, I WAS attempting to solve a continuing problem I had where my Parity Drives always reported errors on their monthly checks. Turned out, the motherboard didn't like those drives connected to it's SATA headers. I now have them connected to the SAS backplane. Before I reached that conclusion, I had bought a separate SATA controller card, in an attempt to solve the problem so I wouldn't lose a hotswap bay. It didn't work, but I was still using the card to control an SSD drive I was using for Plex metadata. Since I got an error this time reporting something with PCI, I remembered that the expansion card WAS something new I had added. Maybe it's failing? I pulled the card out, and with the 1 stick of RAM still in, the system booted up just fine, with no issues or delays! I went ahead and started a MEMTEST with that 1 stick, and so far, it's up to 35% with no issues. Much better than the 5% then failures I had before. I'm going to go ahead and play it safe and let it complete at least 1 pass on this stick, then I'll swap it with another one and test all 4. Man I'm going to be so relieved if that's the problem... It could explain why the parity drives connected to it gave me issues too, if the card itself is bad. I don't think I ever connected the SSD I currently have on it to the motherboard directly, the only stuff I ever had on the motherboard was the parity drives. So if the RAM tests work out, I'll hold tight until I can get the new drive precleared and then work on the SSD. Hope I'm not jinxing myself. < fingers crossed >
  12. NOO! Don't say that! That's even MORE money... Are there any tools or something I can do to determine if that's the case? Like a memtest for cpu?
  13. Just had another crash. I stopped the array and disabled Docker and VMs entirely. Everything was good for about 30 minutes so I tried to preclear again. I was able to see it happen with the open log window. Here's the entire crash. Jan 6 08:43:57 Kyber kernel: general protection fault, probably for non-canonical address 0xff7f88816d33d0b8: 0000 [#1] PREEMPT SMP NOPTI Jan 6 08:43:57 Kyber kernel: CPU: 15 PID: 0 Comm: swapper/15 Tainted: P O 6.1.64-Unraid #1 Jan 6 08:43:57 Kyber kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C35/MEG X570 UNIFY (MS-7C35), BIOS A.F0 03/01/2023 Jan 6 08:43:57 Kyber kernel: RIP: 0010:blkg_to_tg+0xf/0x1c Jan 6 08:43:57 Kyber kernel: Code: 00 00 00 74 05 e8 85 a8 46 00 48 83 c4 38 5b 5d 41 5d 41 5e 41 5f e9 8e 8e 7f 00 48 85 ff 48 89 f8 74 0f 48 63 15 3f e2 f0 00 <48> 8b 84 d7 b8 00 00 00 e9 72 8e 7f 00 0f 1f 44 00 00 31 c0 48 85 Jan 6 08:43:57 Kyber kernel: RSP: 0018:ffffc9000056ce68 EFLAGS: 00010286 Jan 6 08:43:57 Kyber kernel: RAX: ff7f88816d33d000 RBX: ffff88878ff1c840 RCX: 00000000802a0014 Jan 6 08:43:57 Kyber kernel: RDX: 0000000000000000 RSI: ffff88878ff1c840 RDI: ff7f88816d33d000 Jan 6 08:43:57 Kyber kernel: RBP: ffff88878ff1c840 R08: ffff88878ff1c900 R09: 00000000802a0014 Jan 6 08:43:57 Kyber kernel: R10: ffff88878ff1c900 R11: 0000000000032140 R12: 0000000000100001 Jan 6 08:43:57 Kyber kernel: R13: ffff88878ff1c840 R14: 0000000000001000 R15: ffff888171cb2d00 Jan 6 08:43:57 Kyber kernel: FS: 0000000000000000(0000) GS:ffff889faebc0000(0000) knlGS:0000000000000000 Jan 6 08:43:57 Kyber kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 6 08:43:57 Kyber kernel: CR2: 000014975b7b6710 CR3: 000000000220a000 CR4: 0000000000350ee0 Jan 6 08:43:57 Kyber kernel: Call Trace: Jan 6 08:43:57 Kyber kernel: <IRQ> Jan 6 08:43:57 Kyber kernel: ? __die_body+0x1a/0x5c Jan 6 08:43:57 Kyber kernel: ? die_addr+0x38/0x51 Jan 6 08:43:57 Kyber kernel: ? exc_general_protection+0x30f/0x345 Jan 6 08:43:57 Kyber kernel: ? asm_exc_general_protection+0x22/0x30 Jan 6 08:43:57 Kyber kernel: ? blkg_to_tg+0xf/0x1c Jan 6 08:43:57 Kyber kernel: blk_throtl_bio_endio+0x28/0x154 Jan 6 08:43:57 Kyber kernel: bio_endio+0x10f/0x131 Jan 6 08:43:57 Kyber kernel: blk_update_request+0x22f/0x2e5 Jan 6 08:43:57 Kyber kernel: ? _base_process_reply_queue+0x138/0xedd [mpt3sas] Jan 6 08:43:57 Kyber kernel: scsi_end_request+0x27/0xf0 Jan 6 08:43:57 Kyber kernel: scsi_io_completion+0x156/0x457 Jan 6 08:43:57 Kyber kernel: blk_complete_reqs+0x41/0x4c Jan 6 08:43:57 Kyber kernel: __do_softirq+0x129/0x288 Jan 6 08:43:57 Kyber kernel: __irq_exit_rcu+0x5e/0xb8 Jan 6 08:43:57 Kyber kernel: common_interrupt+0x9b/0xc1 Jan 6 08:43:57 Kyber kernel: </IRQ> Jan 6 08:43:57 Kyber kernel: <TASK> Jan 6 08:43:57 Kyber kernel: asm_common_interrupt+0x22/0x40 Jan 6 08:43:57 Kyber kernel: RIP: 0010:native_safe_halt+0x7/0xc Jan 6 08:43:57 Kyber kernel: Code: 7c ff 85 c0 74 0b 65 81 25 1c d7 79 7e ff ff ff 7f 5b 5d e9 55 2c 38 00 e8 8e 88 7d ff f4 e9 4a 2c 38 00 e8 83 88 7d ff fb f4 <e9> 3e 2c 38 00 0f 1f 44 00 00 53 e8 61 4c ff ff 31 ff 89 c6 e8 fa Jan 6 08:43:57 Kyber kernel: RSP: 0018:ffffc900001cfe58 EFLAGS: 00000246 Jan 6 08:43:57 Kyber kernel: RAX: 0000000000004000 RBX: 0000000000000001 RCX: 000000000800f8c4 Jan 6 08:43:57 Kyber kernel: RDX: ffff889faebc0000 RSI: ffff8881016b1c00 RDI: ffff8881016b1c64 Jan 6 08:43:57 Kyber kernel: RBP: ffff8881016b1c64 R08: 000000000800f8c4 R09: 0000000000000002 Jan 6 08:43:57 Kyber kernel: R10: 0000000000000020 R11: 0000000000000187 R12: ffff88810935b400 Jan 6 08:43:57 Kyber kernel: R13: ffffffff823237a0 R14: ffffffff82323820 R15: 0000000000000000 Jan 6 08:43:57 Kyber kernel: ? native_safe_halt+0x5/0xc Jan 6 08:43:57 Kyber kernel: arch_safe_halt+0x5/0xb Jan 6 08:43:57 Kyber kernel: acpi_idle_do_entry+0x2a/0x43 Jan 6 08:43:57 Kyber kernel: acpi_idle_enter+0xbc/0xd0 Jan 6 08:43:57 Kyber kernel: cpuidle_enter_state+0xc9/0x202 Jan 6 08:43:57 Kyber kernel: cpuidle_enter+0x2a/0x38 Jan 6 08:43:57 Kyber kernel: do_idle+0x18d/0x1fb Jan 6 08:43:57 Kyber kernel: cpu_startup_entry+0x2a/0x2c Jan 6 08:43:57 Kyber kernel: start_secondary+0x101/0x101 Jan 6 08:43:57 Kyber kernel: secondary_startup_64_no_verify+0xce/0xdb Jan 6 08:43:57 Kyber kernel: </TASK> Jan 6 08:43:57 Kyber kernel: Modules linked in: md_mod xt_nat veth nvidia_uvm(PO) xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_iotlb xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag nct6775 nct6775_core hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc qlcnic r8169 realtek nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) edac_mce_amd edac_core intel_rapl_msr intel_rapl_common iosf_mbi video drm_kms_helper drm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 mpt3sas sha256_ssse3 btusb sha1_ssse3 btrtl btbcm aesni_intel btintel i2c_piix4 backlight crypto_simd syscopyarea cryptd wmi_bmof mxm_wmi bluetooth rapl sysfillrect raid_class i2c_core k10temp nvme sysimgblt joydev ----------------------------------------- Ok I have discovered a new problem. I can't get the preclear to run with the array off or on, dockers and vms off or on. So I went back to the beginning as you suggested and started a MEMTEST. Turns out, it gets about 5% complete then starts throwing up red errors indicating failure. I have 4 matching sticks (although they are 2 sets of 2 with different manufacturing dates). I removed 2 matching date sticks (newer) and tested with the old RAM. They failed at the same point. Removed those and swapped in the newer sticks. They failed at the same point too and actually threw up a banner I had to clear before I could continue. So the problem might stem from the vey root of the system, starting with the RAM. Unfortunately this is a major set back for me since I don't have a lot of free cash. Income tax season is coming, I might have to wait to get this fixed. But I pretty much know at this point that bad RAM needs fixed before further troubleshooting can be done. Just for gits and shiggles I did alcohol wipe the contacts on the RAM sticks, and blew dust out of the slots, it made no different. <sigh>