mankey54

Members
  • Posts

    19
  • Joined

  • Last visited

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

mankey54's Achievements

Newbie

Newbie (1/14)

2

Reputation

  1. @Vr2Io and everyone want to give an update. So my sad saga on trying to fix my issues has finally concluded. Thanks to you Vr2lo you help me determine that the issue is with my CPU. I've since bought a replacement (Ryzen 5 3600x) and my system is back operational has been running now for 24hrs and counting. This made sense that it was a hardware issue since nothing I've done in the last 6 months configuration lead me to believe it was software but I started with software. Trial and error I replaced my memory, then motherboard and issue still was not resolved if anything got worse. once I replaced my CPU it was back to operational. Since my CPU is under warranty I've started the process to request a new one. Now I have all this extra hardware its now time to build a second system :). Thank you everyone for your support in this. Much appreciated.
  2. @Vr2Io hi Thank you for providing this bit of Memtest. I went about to running this mem test and I found many errors. I didnt see this error with unraid memtest running by default. I guess if im seeing errors running the multicore memtest but not seeing it running single core can it be possible that its a CPU problem? or do I still have a memory error? at the moment im running this memtest benchmark with my new pair of 16GBx2 sticks of ram. I will try again to remove the ram stick and replace with my previous stick of 32GBx1 ram stick for reference of the Ram I am using with my Ryzen 3800x CPU Set 1: 16GBx2 DDR4 2400Mhz Corsair Vengeance LPX Set 2: 32GBx1 DDR4 2400Mhz Corsair Vengeance LPX
  3. @Squid@trurl@John_M Ok update on my never ending saga. Since my last update I had tried all the configurations that was suggested all didnt remedy my issue. I had pretty much chalked it up to hardware issues so I have since bought a replacement motherboard and memory all exactly the same hardware. I've replaced them at at first boot the issue actually became worse the system would not complete it boot up process after starting up from Unraid boot screen. it would simply run and then restart to BIOS prompt and then UnRAID boot and try to startup only to restart again. So the next thing I did was thinking maybe something wrong with my flashdrive so let me rebuild it brand new do not restore the config file just brand new install. upon starting up it would again do the same thing as mentioned before. So now Im at this point maybe other hardware problems. So I start removing PCI cards that I have had installed. I may have forgot to mentioned but I have other hardwares installed but they all had worked with Unraid without issues before for over a year so I thought nothing of it. Because of new motherboard it may have issues so let see. To give idea what I had attached I have 2 graphics cards (Radeon 7850, GTX1060) 2 USB 3.0 controller cards (Rosewill, another card dont remember the brand). I removed all the cards with the exception of one graphics card ( Radeon 7850). After this the system would boot up and.... I am able to BOOT UP! At this point I safely shut it down and boot it backup couple times to confirm it stable booting. I also when in to restore my config folder and also confirmed that I am able to boot up. So you maybe thinking yay its over. Sadly no, however I have some more details now that didnt appear before. I went about starting up the array and staring only docker. I decided to delete the docker image and rebuild my dockers by template. I let the system run a bit and I saw seg faults in the sys log same as before and the system did lock up to the point I had to hard reboot. Now am still able to start up the system but this time i've disabled docker and kept the array running its currently running for the last 35 mins and counting. However in my notification I get an error that I've not seen before that was very helpful. I get the following error "Fix Common Problems: Error: Machine Check Events detected on your server". I got Fix Common Problems and the message now tells me the following. "our server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the unRaid forums. The output of mcelog (if installed) has been logged". The system is still running but for how long not sure I will keep monitoring it. So I've posted my diagnostics with this update in hopes you can help me. I hope the mcelog is present that I did it properly. Thank you all for any help you can provide. supergrid-diagnostics-20210411-1940.zip
  4. @John_M it's a RMx Series™ RM1000x — 1000 Watt 80 PLUS and I had it since 2019 so not yet 2 years
  5. @trurl want to give an update. I've replaced my usb flash drive and rebuilt the flashdrive following instructions. New Install of UNRAID 6.9.1 and copied all content from config folder to new drive. The configuration transferred over however upon starting up array to starting Parity Check the system would crash and reboot again. I was monitoring the syslog however at point of crash there were no error captured. I've done a couple things afterward hoping to isolate the problem listed them in order and results rebuilt new flashdrive with previous version 6.8.3 copying over config file result configuration was recognized. Upon starting up array and starting Parity Check the system would crash and reboot. same symptom made BIO Configuration change suggested by @John_M Help me identify the setting on my mobo to set Power setting to "Typical Current Idle" results boot up to UNRAID was same as step 1. This time I did not let array start up just let the system stabilize before enabling more features. about 1-2 hours later the system would crash and reboot again. upgraded BIOS. The current version running is initial release (F3 of Gigabyte x570 mobo) so I upgraded to next release (F4 of Gigabyte x570 mobo). results boot up to UNRAID was same as step 1. did not let array start up let system stabilize. again after 1 hour system would crash and reboot again. Things I've test so far upgrade UNRAID to 6.9.1 downgrade UNRAID to 6.8.3 Tested Memory (Memtest+83) re-seeded the Ram also moved Ram to another slot changed power setting to "Typical Current Idle" Clear BIOS (physically on mobo with screw driver touching the two points) upgraded BIOS to next version (following gigabyte procedures) at this point I am out of configuration options. I can only deduce at this point it is a hardware problem. Im in process to RMA my mobo since ram test seem successful. I dont have another system to transplant my hard drives will have to wait till I get a new mobo. however if you have any other things to test I am open to it.
  6. @ trurl, I sure I can start up the array. Since Im using a new usb flash I will have to transfer the key to the new flash drive. in regards to the MemTest I've actually ran MemTest few times already as well as reseeding the Memory stick for good measure. My last memory test I have ran 2 days back it completed 10 consecutive passes without any failures. my RAM is a Corsair Vengeance DDR4 PC2400 32GBx1 Update on the current run. I had the system running (6.8.1 SAFE mode Array stopped) for about 40 minutes and segmentation faults have returned. Attached the diagnostics when it first appeared. It continues to run abit longer and then the system would crash again. I will try again with array started and pull down the diagnostics and repost. @John_M thank you for your steps into the cstate. after I try these steps from trurl I will look into BIOS settings you suggested. supergrid-diagnostics-20210329-1555.zip
  7. hi trurl. Thank you for your reply. I was able to restore a previous backup 2 weeks back, this version is 6.8.1 i let it run normally and it still has the crashing problem. After which your direction to run in SAFE Mode I am currently doing that now with the back up (6.8.1). currently running for 5 mins no crash yet. keeping the System Log window open to capture anything. from here what should I do? I went ahead to pull down another diagnostic just in case supergrid-diagnostics-20210329-1520.zip
  8. hi There I've an update. Unfortnately the problem is getting worse. The issue now the system is in a costant crash loop. Unraid would start up and in 5-10 minutes the system would crash and restart. I was able to pull down the diagnotics. I've not yet created a new USB flashdrive as I am not able to keep the system long enough to even pull a backup. I do have a older backup 2 weeks back that I will try to restore. However this backup was captured during the initial issue not before this issue first began. If anyone can help me it would greatly appreciate it. Any troubleshooting steps would help me out alot. Im little lost now. supergrid-diagnostics-20210329-1419.zip
  9. Hi Chat and Nior, Thank you for your response. In regards to cstate upon reading it mention only to disable this if you are running a earlier Ryzen 1xxxx. However I am running a later version Ryzen3800x. With that being said I looked into my bios I was not able to find the cstate settings. I've tried searching on google to find this no luck. It's worth noting again that I've been running with this hardware configuration for well over a year and only recently ive encounter these crashes. There hasn't been any changes made to my system which makes me believe it to be possible a hardware problem. I wonder if the flash drive holding the unraid settings could be damaged or corrupt? I would like to investigate into cstate settings if anyone can share how navigate to through the bios. I have a Gigabyte x570 Gaming mobo. Since last I've went back to test the RAM and have completed 10 pass without error. I've restarted the system and try to complete a parrott check however now I received a new error. See attached screenshot. My next steps I will now look to transfer to a new flashdrive to see if it helps but also considering to rollback to 6.8.3 or 6.8.1. it's also worth noting again that I first had their problem when running 6.8.3 but it's been running stable under 6.8.3 for over 6 months and only recently upgrade to 6.9.1 in hopes to fix the problem.
  10. Hi Trurl. I've not done any of the changes in the FAQ. it speaks about ram speeds. I use DDR 2400 1 stick 32GB. I've not done any tuning on the motherboard side and I dont over-clock. I've tested the ram multiple times let it run 2 night straight with all passing no failures. Also I like to note that I've been running with this setup for over a year now and just recently this issue is occuring. I've not overclocked anything or upgraded unraid previously I was on 6.8.3 and only upgraded 6.9.1 after the issue arised in hopes it fixed the problem.
  11. Hi There, I was able to capture a diagnostics once the segfault appears. Once I see this segfault the system would crash in a few minutes. Please know that I have also stopped the array but this error persists. Mar 20 21:27:22 SuperGrid kernel: php[16009]: segfault at 2881cef8 ip 000015084ad30053 sp 00007ffc2881cd90 error 4 in ld-2.30.so[15084ad26000+20000] Mar 20 21:27:22 SuperGrid kernel: Code: ef c0 48 83 7c 24 08 00 48 89 44 24 40 0f 29 44 24 50 74 11 f7 84 24 d0 00 00 00 fa ff ff ff 0f 85 e7 0c 00 00 48 8b 44 24 18 <49> 8b 0c 24 48 83 bc 24 d8 00 00 00 00 4c 8b 08 0f 85 67 02 00 00 Troubleshooting I have done so far in this order - upgraded to latest unraid 6.9.1 - Completed a Mem test (completed 10 consective test with 0 failures) - turned off VM and Docker to isolate if it was related to VM or Docker image - rebuilt Docker image - stopped the array supergrid-diagnostics-20210320-2131.zip
  12. hi There I have an update on this issue. the problem continues to persists. I find it difficult capture the error however at one point when I was troubleshooting I was able to capture the problem here are the I see many segfault errors. below are the logs I was able to capture Mar 20 17:59:34 SuperGrid kernel: Call Trace: Mar 20 17:59:34 SuperGrid kernel: swake_up_one+0x16/0x24 Mar 20 17:59:34 SuperGrid kernel: rcu_nocb_gp_kthread+0x23c/0x433 Mar 20 17:59:34 SuperGrid kernel: ? rcu_bind_current_to_nocb+0x38/0x38 Mar 20 17:59:34 SuperGrid kernel: kthread+0xe5/0xea Mar 20 17:59:34 SuperGrid kernel: ? __kthread_bind_mask+0x57/0x57 Mar 20 17:59:34 SuperGrid kernel: ret_from_fork+0x22/0x30 Mar 20 17:59:34 SuperGrid kernel: Modules linked in: xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap veth xt_nat xt_MASQUERADE iptable_nat nf_nat xfs md_mod hwmon_vid ip6table_filter ip6_tables iptable_filter ip_tables bonding edac_mce_amd kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel r8169 aesni_intel crypto_simd cryptd nvme wmi_bmof realtek i2c_piix4 k10temp i2c_core nvme_core ahci glue_helper wmi ccp rapl libahci thermal button acpi_cpufreq Mar 20 17:59:34 SuperGrid kernel: CR2: ffffffff81071b33 Mar 20 17:59:34 SuperGrid kernel: ---[ end trace 8f246789f2fd9320 ]--- Mar 20 17:59:34 SuperGrid kernel: RIP: 0010:update_nohz_stats+0x0/0x4e Mar 20 17:59:34 SuperGrid kernel: Code: 40 84 ed 74 0a 31 f6 4c 89 ff e8 56 de ff ff 48 8b 74 24 08 48 83 c4 18 4c 89 ff 5b 5d 41 5c 41 5d 41 5e 41 5f e9 6b e8 67 00 <83> 7f 20 00 74 45 53 48 89 fb 8b bf 48 0a 00 00 89 f2 48 c7 c6 80 Mar 20 17:59:34 SuperGrid kernel: RSP: 0018:ffffc9000043fb50 EFLAGS: 00010046 Mar 20 17:59:34 SuperGrid kernel: RAX: 0000000000000005 RBX: ffff11091e962380 RCX: 0000000000000005 Mar 20 17:59:34 SuperGrid kernel: RDX: ffff888100022380 RSI: 0000000000000000 RDI: ffff11091e962380 Mar 20 17:59:34 SuperGrid kernel: RBP: 0000000000000005 R08: 0000000000000000 R09: 0000000000000005 Mar 20 17:59:34 SuperGrid kernel: R10: 0000000000000000 R11: ffff88881e962400 R12: ffffc9000043fc08 Mar 20 17:59:34 SuperGrid kernel: R13: ffffc9000043fc88 R14: ffffc9000043fd28 R15: ffff88810094f9c0 Mar 20 17:59:34 SuperGrid kernel: FS: 0000000000000000(0000) GS:ffff88881e9c0000(0000) knlGS:0000000000000000 Mar 20 17:59:34 SuperGrid kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 20 17:59:34 SuperGrid kernel: CR2: ffffffff81071b33 CR3: 000000000200c000 CR4: 0000000000350ee0 Mar 20 17:59:36 SuperGrid nmbd[9868]: [2021/03/20 17:59:36.873106, 0] ../../source3/libsmb/nmblib.c:922(send_udp) Mar 20 17:59:36 SuperGrid nmbd[9868]: Packet send failed to 172.17.255.255(138) ERRNO=Network is unreachable Mar 20 17:59:36 SuperGrid nmbd[9868]: [2021/03/20 17:59:36.873186, 0] ../../source3/libsmb/nmblib.c:922(send_udp) Mar 20 17:59:36 SuperGrid nmbd[9868]: Packet send failed to 172.17.255.255(137) ERRNO=Network is unreachable Mar 20 17:59:36 SuperGrid nmbd[9868]: [2021/03/20 17:59:36.873196, 0] ../../source3/nmbd/nmbd_packets.c:179(send_netbios_packet) Mar 20 17:59:36 SuperGrid nmbd[9868]: send_netbios_packet: send_packet() to IP 172.17.255.255 port 137 failed Mar 20 17:59:36 SuperGrid nmbd[9868]: [2021/03/20 17:59:36.873206, 0] ../../source3/nmbd/nmbd_nameregister.c:581(register_name) Mar 20 17:59:36 SuperGrid nmbd[9868]: register_name: Failed to send packet trying to register name #001#002__MSBROWSE__#002<01> Mar 20 17:59:44 SuperGrid nmbd[9868]: [2021/03/20 17:59:44.885890, 0] ../../source3/nmbd/nmbd_become_lmb.c:397(become_local_master_stage2) Mar 20 17:59:44 SuperGrid nmbd[9868]: ***** Mar 20 17:59:44 SuperGrid nmbd[9868]: Mar 20 17:59:44 SuperGrid nmbd[9868]: Samba name server SUPERGRID is now a local master browser for workgroup WORKGROUP on subnet 192.168.1.8 Mar 20 17:59:44 SuperGrid nmbd[9868]: Mar 20 17:59:44 SuperGrid nmbd[9868]: ***** Mar 20 17:59:44 SuperGrid nmbd[9868]: [2021/03/20 17:59:44.886001, 0] ../../source3/nmbd/nmbd_become_lmb.c:397(become_local_master_stage2) Mar 20 17:59:44 SuperGrid nmbd[9868]: ***** Mar 20 17:59:44 SuperGrid nmbd[9868]: Mar 20 17:59:44 SuperGrid nmbd[9868]: Samba name server SUPERGRID is now a local master browser for workgroup WORKGROUP on subnet 192.168.122.1 Mar 20 17:59:44 SuperGrid nmbd[9868]: Mar 20 17:59:44 SuperGrid nmbd[9868]: ***** Mar 20 17:59:56 SuperGrid kernel: sensors[10601]: segfault at 7ffe8754b20b ip 00007ffe8754b20b sp 00007ffcaf35ba10 error 14 Mar 20 17:59:56 SuperGrid kernel: Code: Unable to access opcode bytes at RIP 0x7ffe8754b1e1. Mar 20 18:00:34 SuperGrid nmbd[9868]: [2021/03/20 18:00:34.939702, 0] ../../source3/libsmb/nmblib.c:922(send_udp) Mar 20 18:00:34 SuperGrid nmbd[9868]: Packet send failed to 172.17.255.255(138) ERRNO=Network is unreachable Mar 20 18:00:35 SuperGrid kernel: plugin[10819]: segfault at 4129808 ip 0000150b83470053 sp 00007ffe041296a0 error 4 in ld-2.30.so[150b83466000+20000] Mar 20 18:00:35 SuperGrid kernel: Code: ef c0 48 83 7c 24 08 00 48 89 44 24 40 0f 29 44 24 50 74 11 f7 84 24 d0 00 00 00 fa ff ff ff 0f 85 e7 0c 00 00 48 8b 44 24 18 <49> 8b 0c 24 48 83 bc 24 d8 00 00 00 00 4c 8b 08 0f 85 67 02 00 00
  13. Hi Squid. I wanted to give you an update on my troubleshooting progress Updated Unraid to 6.9.1 and complete memory test (did 10 passes without error). I was able to restart unraid and complete Parity Check. currently started up docker and continue to monitor the server. Slowly enabling all the settings, still need to enable VM waiting for another day of stability before I continue to this. Server has been up running for 1.5 days now will continue to monitor. Thank you so much for your support
  14. thank you Squid. I just read the Read me First message mentioned that Here go you. Also I need to make a correction the version of unraid is 6.8.1 supergrid-diagnostics-20210310-1450.zip
  15. Hi There Community. I have an update on this issue. Afterward I tried to leave Unraid running with Array stopped. it will continue running for maybe 1 hour and then it would crash with similar message before. give context I am running the previous stable release I believe it is 6.8.3 I've not yet upgraded to the latest stable release.