Jump to content

Parity check not finishing


loady

Recommended Posts

  • Replies 80
  • Created
  • Last Reply

Top Posters In This Topic

  • 2 weeks later...
On 1/1/2023 at 9:27 AM, JorgeB said:

Rename all plg files to bak (/boot/config/plugins) then start renaming them back one by one to see if you can find the culprit.

 The server has been in use so i have not been able to do what you said yet, however, whilst in safe mode gui mode (no plugins) it has still been crashing, it just takes a longer time to do it.  

syslog warptower-diagnostics-20230114-1305.zip

Link to comment

Did the crash occur between these timestamps?

Jan 13 16:42:12 Warptower emhttpd: read SMART /dev/sdc
Jan 14 12:45:32 Warptower kernel: Linux version 5.15.46-Unraid (root@Develop) (gcc (GCC) 11.2.0, GNU ld version 2.37-slack15) #1 SMP Fri Jun 10 11:08:41 PDT 2022

 

dump starts here but that is several hours earlier

Jan 13 09:03:52 Warptower kernel: general protection fault, probably for non-canonical address 0x30000000000020: 0000 [#1] SMP NOPTI

 

Looks like you only completed one memtest pass in your earlier screenshot. 

 

Also, didn't see anybody mention this

 

Link to comment
  • 2 months later...
On 1/14/2023 at 1:55 PM, trurl said:

Did the crash occur between these timestamps?

Jan 13 16:42:12 Warptower emhttpd: read SMART /dev/sdc
Jan 14 12:45:32 Warptower kernel: Linux version 5.15.46-Unraid (root@Develop) (gcc (GCC) 11.2.0, GNU ld version 2.37-slack15) #1 SMP Fri Jun 10 11:08:41 PDT 2022

 

dump starts here but that is several hours earlier

Jan 13 09:03:52 Warptower kernel: general protection fault, probably for non-canonical address 0x30000000000020: 0000 [#1] SMP NOPTI

 

Looks like you only completed one memtest pass in your earlier screenshot. 

 

Also, didn't see anybody mention this

Sorry for the delay in responding, i have been using the server heavily so i have been able to afford the downtime. No one mentioned the above, however i have been running on this hardware for over four years and this started happening less than a year ago, maybe 6 months ago, i was advised to update the BIOS which i did. The last three crashes i have grabbed a diags, the crashes are so very random, the last one i think took over a week, sometimes it will happen same day, i can only operate in safe mode, if it boots normally it will happen in lest than half an hour and is more consistantly crashing.

 

warptower-diagnostics-20230318-1326.zip warptower-diagnostics-20230321-1744.zip warptower-diagnostics-20230403-1235.zip

Link to comment
55 minutes ago, JorgeB said:

Unraid driver is crashing, this usually is a hardware problem or a kernel compatibility issue, try updating to v6.11.5 or v6.12-rc2 and if the issue persists it's likely hardware related.

Thats going to be a headache if it is hardware related... where do you even start looking, i suppose i could remove the memory and just start with one stick and see if it persists, adding a stick at a time. Would be good if it was the mobo because i am looking at upgrading that to one with two .m2 slots

 

Link to comment
  • 3 weeks later...
1 hour ago, loady said:

So, it was up for over six days in safe mode and crashed again last night.. if this is hardware related where would the logical place be to start ?

Did you do the Ryzen specific settings linked above? Didn't see a reply about that, if you did and issues persist board/CPU would be the main suspects.

Link to comment
  • 1 year later...
Posted (edited)
On 4/25/2023 at 3:45 PM, JorgeB said:

Did you do the Ryzen specific settings linked above? Didn't see a reply about that, if you did and issues persist board/CPU would be the main suspects.

Wow..have i been bumbling along like this for a year !

 

My server is still crashing constantly, daily if not left in safe mode, in safe mode i might get longer between crashes, parity usualy hangs at a certain point pretty much all the time, i have checked the BIOS for the C states but as far as i can see they are not enabled, couldnt really find the exact field to look as described in the thread for it, as you can see C states were off and had been anyway, restore AC power loss was set to power off.G47KKFl.png

 

Now some of the drives have errors, i have spare drives to replace them and i actually want to change my parity drive to a 10TB disk, however, i am not sure if it wise to do this whilst its crashing like this ?, if it crashes whilst replacing parity i think i might  be up the creek without a paddle, its been said it is 'probably' hardware issue, i had done memtest and looked at all the other suggestions, would the hard drives cause this crashing ? should i boot the server to get GUI but NOT mount the disks and leave for a few days, if the disks were causing the issue then not being mounted would they be out of play for the cause ? i am even looking to see if replacing the motherboard will resolve the issue but where do start with these kind of problems, the power supply was replaced very recently when it was found to be dropping out and causing disk error, the new psu resolved this.

 

The attached syslof is the server booted without the drives mounted.

warptower-diagnostics-20240506-1308.zip

Edited by loady
Link to comment
1 minute ago, loady said:

parity usualy hangs at a certain point pretty much all the time,

If it crashes during parity check/sync it should be be related to the C-states, and those only affect when the server is idling.

 

2 minutes ago, loady said:

i am not sure if it wise to do this whilst its crashing like this ?

I would not recommend it.

 

Link to comment

I have been running on this equipment, mobo etc without issue for a good few years, this all started about 18 months ago, i use the server quite a bit so trying to get to bottom of it is frustrating, i am at the point of throwing money at it now

 

Link to comment
Posted (edited)
32 minutes ago, JorgeB said:

Since you have multiple RAM sticks try I would with just one, if the same try with a different one, that will basically rule out bad RAM, after that board, CPU or PSU would be the next suspects.

Ok, that i will try, so is it pointless me just leaving the server with the disks unmouted to see if it crashes like this ?

 

I do actually want to upgrade the board to one with two .M2 slots...any recommendations ?

Edited by loady
Link to comment

Now this is odd... servers been up in safe mode for 4 days, i just went into the GUI and looked at my dockers but it seemed to snarl up, now the GUI is unresponsive and wont load at all but i can see all my shares in windows, i just went into a disk and clicked something and i can could hear the disk spin up so its still working somewhere.

Link to comment
Posted (edited)
2 hours ago, trurl said:

Post new diagnostics

warptower-diagnostics-20240511-1535.zip syslog

 

So  GUI is unresponsive and i hooked up a monitor to the server, thats just a black screen, i tried to SSH into server and it prompted for my user/pass but just hung when i entered password, shares were still working and showing in windows and i fired up a few files, i had to hard boot the server so the diags are from after that, i had enabled syslog server some time ago so i added that too.

Edited by loady
Link to comment

Constant call traces, start by running memtest, if nothing is found, and because memtest is only definitive if it finds errors, try with just one stick of RAM, if the same try with a different one, that will basically rule out bad RAM, if issues persist, another thing you can try is to boot the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Link to comment
12 minutes ago, JorgeB said:

Constant call traces, start by running memtest, if nothing is found, and because memtest is only definitive if it finds errors, try with just one stick of RAM, if the same try with a different one, that will basically rule out bad RAM, if issues persist, another thing you can try is to boot the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

testing RAM sticks is my next move, i have done memtest previously and found no error, when you say start in safe with dockers disabled, is that to say i start safe mode with GUI but no plugins and then disable the dockers manually one by one ?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...