mathomas3 Posted December 30, 2022 Share Posted December 30, 2022 6 minutes ago, loady said: so i started it in gui mode no plugins, should i still start the array though or just leave it like that ? I would warn you on the temp of those drives! They will die a lot faster Quote Link to comment
loady Posted December 30, 2022 Author Share Posted December 30, 2022 1 hour ago, mathomas3 said: I would warn you on the temp of those drives! They will die a lot faster im still trying to find fan regulators for the 3 fans that face the drives, its like sitting next to a jet engine if they are just plugged in without any speed control Quote Link to comment
loady Posted December 31, 2022 Author Share Posted December 31, 2022 5 hours ago, JorgeB said: Start Right, its been up for nearly 20 hours now, parity check has completed and no errors. All dockers are functional but no plugins installed, i chose safemode GUI no plugins. syslog Quote Link to comment
JorgeB Posted January 1, 2023 Share Posted January 1, 2023 Rename all plg files to bak (/boot/config/plugins) then start renaming them back one by one to see if you can find the culprit. Quote Link to comment
loady Posted January 14, 2023 Author Share Posted January 14, 2023 On 1/1/2023 at 9:27 AM, JorgeB said: Rename all plg files to bak (/boot/config/plugins) then start renaming them back one by one to see if you can find the culprit. The server has been in use so i have not been able to do what you said yet, however, whilst in safe mode gui mode (no plugins) it has still been crashing, it just takes a longer time to do it. syslog warptower-diagnostics-20230114-1305.zip Quote Link to comment
trurl Posted January 14, 2023 Share Posted January 14, 2023 Did the crash occur between these timestamps? Jan 13 16:42:12 Warptower emhttpd: read SMART /dev/sdc Jan 14 12:45:32 Warptower kernel: Linux version 5.15.46-Unraid (root@Develop) (gcc (GCC) 11.2.0, GNU ld version 2.37-slack15) #1 SMP Fri Jun 10 11:08:41 PDT 2022 dump starts here but that is several hours earlier Jan 13 09:03:52 Warptower kernel: general protection fault, probably for non-canonical address 0x30000000000020: 0000 [#1] SMP NOPTI Looks like you only completed one memtest pass in your earlier screenshot. Also, didn't see anybody mention this Quote Link to comment
loady Posted April 3, 2023 Author Share Posted April 3, 2023 On 1/14/2023 at 1:55 PM, trurl said: Did the crash occur between these timestamps? Jan 13 16:42:12 Warptower emhttpd: read SMART /dev/sdc Jan 14 12:45:32 Warptower kernel: Linux version 5.15.46-Unraid (root@Develop) (gcc (GCC) 11.2.0, GNU ld version 2.37-slack15) #1 SMP Fri Jun 10 11:08:41 PDT 2022 dump starts here but that is several hours earlier Jan 13 09:03:52 Warptower kernel: general protection fault, probably for non-canonical address 0x30000000000020: 0000 [#1] SMP NOPTI Looks like you only completed one memtest pass in your earlier screenshot. Also, didn't see anybody mention this Sorry for the delay in responding, i have been using the server heavily so i have been able to afford the downtime. No one mentioned the above, however i have been running on this hardware for over four years and this started happening less than a year ago, maybe 6 months ago, i was advised to update the BIOS which i did. The last three crashes i have grabbed a diags, the crashes are so very random, the last one i think took over a week, sometimes it will happen same day, i can only operate in safe mode, if it boots normally it will happen in lest than half an hour and is more consistantly crashing. warptower-diagnostics-20230318-1326.zip warptower-diagnostics-20230321-1744.zip warptower-diagnostics-20230403-1235.zip Quote Link to comment
trurl Posted April 3, 2023 Share Posted April 3, 2023 Diagnostics contains current syslog, which has no entries before reboot, so these tell us nothing that happened before you rebooted each time. That is the whole point of getting us syslogs from syslog server. Quote Link to comment
loady Posted April 4, 2023 Author Share Posted April 4, 2023 17 hours ago, trurl said: Diagnostics contains current syslog, which has no entries before reboot, so these tell us nothing that happened before you rebooted each time. That is the whole point of getting us syslogs from syslog server. Ah, sorry, i forgot i had enabled this, is this syslog of any use ? it should show the times i have had to hard reboot the server syslog Quote Link to comment
JorgeB Posted April 4, 2023 Share Posted April 4, 2023 13 minutes ago, loady said: it should show the times i have had to hard reboot the server Unraid driver is crashing, this usually is a hardware problem or a kernel compatibility issue, try updating to v6.11.5 or v6.12-rc2 and if the issue persists it's likely hardware related. Quote Link to comment
loady Posted April 4, 2023 Author Share Posted April 4, 2023 55 minutes ago, JorgeB said: Unraid driver is crashing, this usually is a hardware problem or a kernel compatibility issue, try updating to v6.11.5 or v6.12-rc2 and if the issue persists it's likely hardware related. Thats going to be a headache if it is hardware related... where do you even start looking, i suppose i could remove the memory and just start with one stick and see if it persists, adding a stick at a time. Would be good if it was the mobo because i am looking at upgrading that to one with two .m2 slots Quote Link to comment
loady Posted April 25, 2023 Author Share Posted April 25, 2023 So, it was up for over six days in safe mode and crashed again last night.. if this is hardware related where would the logical place be to start ? I have a 10TB drive i want to use to replace my parity drive, im guessing its not wise to do so whilst it is exhibiting this behaviour ? syslog (1) warptower-diagnostics-20230425-1348.zip Quote Link to comment
JorgeB Posted April 25, 2023 Share Posted April 25, 2023 1 hour ago, loady said: So, it was up for over six days in safe mode and crashed again last night.. if this is hardware related where would the logical place be to start ? Did you do the Ryzen specific settings linked above? Didn't see a reply about that, if you did and issues persist board/CPU would be the main suspects. Quote Link to comment
loady Posted May 6 Author Share Posted May 6 (edited) On 4/25/2023 at 3:45 PM, JorgeB said: Did you do the Ryzen specific settings linked above? Didn't see a reply about that, if you did and issues persist board/CPU would be the main suspects. Wow..have i been bumbling along like this for a year ! My server is still crashing constantly, daily if not left in safe mode, in safe mode i might get longer between crashes, parity usualy hangs at a certain point pretty much all the time, i have checked the BIOS for the C states but as far as i can see they are not enabled, couldnt really find the exact field to look as described in the thread for it, as you can see C states were off and had been anyway, restore AC power loss was set to power off. Now some of the drives have errors, i have spare drives to replace them and i actually want to change my parity drive to a 10TB disk, however, i am not sure if it wise to do this whilst its crashing like this ?, if it crashes whilst replacing parity i think i might be up the creek without a paddle, its been said it is 'probably' hardware issue, i had done memtest and looked at all the other suggestions, would the hard drives cause this crashing ? should i boot the server to get GUI but NOT mount the disks and leave for a few days, if the disks were causing the issue then not being mounted would they be out of play for the cause ? i am even looking to see if replacing the motherboard will resolve the issue but where do start with these kind of problems, the power supply was replaced very recently when it was found to be dropping out and causing disk error, the new psu resolved this. The attached syslof is the server booted without the drives mounted. warptower-diagnostics-20240506-1308.zip Edited May 6 by loady Quote Link to comment
JorgeB Posted May 6 Share Posted May 6 1 minute ago, loady said: parity usualy hangs at a certain point pretty much all the time, If it crashes during parity check/sync it should be be related to the C-states, and those only affect when the server is idling. 2 minutes ago, loady said: i am not sure if it wise to do this whilst its crashing like this ? I would not recommend it. Quote Link to comment
loady Posted May 6 Author Share Posted May 6 1 minute ago, JorgeB said: If it crashes during parity check/sync it should be be related to the C-states, and those only affect when the server is idling. I would not recommend it. warptower-diagnostics-20240506-1308.zip Quote Link to comment
loady Posted May 6 Author Share Posted May 6 I have been running on this equipment, mobo etc without issue for a good few years, this all started about 18 months ago, i use the server quite a bit so trying to get to bottom of it is frustrating, i am at the point of throwing money at it now Quote Link to comment
JorgeB Posted May 6 Share Posted May 6 Since you have multiple RAM sticks try I would with just one, if the same try with a different one, that will basically rule out bad RAM, after that board, CPU or PSU would be the next suspects. Quote Link to comment
loady Posted May 6 Author Share Posted May 6 (edited) 32 minutes ago, JorgeB said: Since you have multiple RAM sticks try I would with just one, if the same try with a different one, that will basically rule out bad RAM, after that board, CPU or PSU would be the next suspects. Ok, that i will try, so is it pointless me just leaving the server with the disks unmouted to see if it crashes like this ? I do actually want to upgrade the board to one with two .M2 slots...any recommendations ? Edited May 6 by loady Quote Link to comment
loady Posted May 11 Author Share Posted May 11 Now this is odd... servers been up in safe mode for 4 days, i just went into the GUI and looked at my dockers but it seemed to snarl up, now the GUI is unresponsive and wont load at all but i can see all my shares in windows, i just went into a disk and clicked something and i can could hear the disk spin up so its still working somewhere. Quote Link to comment
loady Posted May 11 Author Share Posted May 11 (edited) 2 hours ago, trurl said: Post new diagnostics warptower-diagnostics-20240511-1535.zip syslog So GUI is unresponsive and i hooked up a monitor to the server, thats just a black screen, i tried to SSH into server and it prompted for my user/pass but just hung when i entered password, shares were still working and showing in windows and i fired up a few files, i had to hard boot the server so the diags are from after that, i had enabled syslog server some time ago so i added that too. Edited May 11 by loady Quote Link to comment
JorgeB Posted May 11 Share Posted May 11 Constant call traces, start by running memtest, if nothing is found, and because memtest is only definitive if it finds errors, try with just one stick of RAM, if the same try with a different one, that will basically rule out bad RAM, if issues persist, another thing you can try is to boot the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
loady Posted May 11 Author Share Posted May 11 12 minutes ago, JorgeB said: Constant call traces, start by running memtest, if nothing is found, and because memtest is only definitive if it finds errors, try with just one stick of RAM, if the same try with a different one, that will basically rule out bad RAM, if issues persist, another thing you can try is to boot the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. testing RAM sticks is my next move, i have done memtest previously and found no error, when you say start in safe with dockers disabled, is that to say i start safe mode with GUI but no plugins and then disable the dockers manually one by one ? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.