Jump to content

loady

Members
  • Posts

    713
  • Joined

  • Last visited

Posts posted by loady

  1. Whats the possibility of a faulty flash drive that unraid boots from causing the issues ? twice now it has been removed as the first boot device and today when i tried to boot it wasnt even showing up in the BIOS, i unplgged it and checked it came up in windows (complained it wanted to fix it) which i didnt, plugged it back in to server and booted and set it as first boot device again.

  2. 18 minutes ago, trurl said:

    Do you have a cooling problem?

    Well, i shouldnt have, this case has three fans in it but i cant even have one as it is like a rocket, i have been looking at some can of regulator i can put directly inline but not had any luck finding anything. It is something i need to do

  3. so array seems fine in safe mode, will stay up, if i let it start in normal mode it gradually degrades, usually through the parity check, when i came in today it is unresponsive but you can hear all the drives chattering and can literally fry an egg on it. I have hard rebooted the server into normal mode and stopped parity check, see if it stagnates again, seemed to be fine once i had removed one stick of RAM, i have moved it to a different slot now.

    syslog syslog-previous warptower-diagnostics-20240622-1234.zip

  4. 10 minutes ago, JorgeB said:

    Try booting in safe mode and/or closing any browser windows open to the GUI, only open when you need to use it then close again.

    ideally i dont want to run in safemode constantly ? i just spun down all the drives and they have all spun up again, it was never doing this, three of the drives have errors though they are not red balled so i assume they are just warnings. Is it being caused because i only have one stick of 8GB RAM in it ?

  5. Well its been up and fine for well over a week and running in normal mode with no crashes after removing one stick of RAM... however, i started to notice the drives were chattering away by themselves and the GUI become less and less responsive, its not crashed like it was doing before but its not usuable and pressing power wont shut it down, the drives are going mad, have had to hard shut down and started up but stopped parity check this time, syslog and diags attached. loading things is painfully slow and just gets slower

    syslog syslog-previous warptower-diagnostics-20240602-1047.zip

  6. 3 hours ago, itimpi said:

    It could be.   It could also be simply the fact of having the extra RAM module installed where each one checks out fine individually, but you get failures when both installed.

    Well, that would be a bummer and about my luck...at least i have stability for now, running on one stick of 8GB

  7. On 5/13/2024 at 10:06 AM, JorgeB said:

     That's a good place to start, start by running memtest, but note that memtest is only definitive if it finds errors, if you have multiple sticks you can also try with just one, if the same try with a different one, that will basically rule out bad RAM, next suspects after that would be PSU, board, CPU.

    So, popped the lid, removed one of the two RAM sticks, fired up server in safe mode and left all dockers running, been up for two days now, started a parity check yesterday and has finished, do the diags look better ? next i will swap sticks over, could it also be the RAM slot and not the actual RAM ?

    warptower-diagnostics-20240521-0902.zip

  8. 6 hours ago, JorgeB said:

    Starting in safe mode disables all plugins, you can then also disable VM and docker services, and if stable, start enabling one by one, including individual containers and VMs, if that's not practical, you can try the other way, that is, still in safe mode, but only disable on VM or one container, if still issues enable that one and disable the next.

    so i started in safe mode GUI no plugins, there were no VM's running (stopped) and i stopped all the dockers, parity check started but its just hung at 2.3%, it never completes parity check, it either hangs or the server crashes and have to hard reboot...safe to say its not the dockers or anything ?...time to start testing the RAM sticks ?

    warptower-diagnostics-20240512-1545.zip

  9. 12 minutes ago, JorgeB said:

    Constant call traces, start by running memtest, if nothing is found, and because memtest is only definitive if it finds errors, try with just one stick of RAM, if the same try with a different one, that will basically rule out bad RAM, if issues persist, another thing you can try is to boot the server in safe mode with all docker containers/VMs disabled, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

    testing RAM sticks is my next move, i have done memtest previously and found no error, when you say start in safe with dockers disabled, is that to say i start safe mode with GUI but no plugins and then disable the dockers manually one by one ?

  10. 2 hours ago, trurl said:

    Post new diagnostics

    warptower-diagnostics-20240511-1535.zip syslog

     

    So  GUI is unresponsive and i hooked up a monitor to the server, thats just a black screen, i tried to SSH into server and it prompted for my user/pass but just hung when i entered password, shares were still working and showing in windows and i fired up a few files, i had to hard boot the server so the diags are from after that, i had enabled syslog server some time ago so i added that too.

  11. Now this is odd... servers been up in safe mode for 4 days, i just went into the GUI and looked at my dockers but it seemed to snarl up, now the GUI is unresponsive and wont load at all but i can see all my shares in windows, i just went into a disk and clicked something and i can could hear the disk spin up so its still working somewhere.

  12. 32 minutes ago, JorgeB said:

    Since you have multiple RAM sticks try I would with just one, if the same try with a different one, that will basically rule out bad RAM, after that board, CPU or PSU would be the next suspects.

    Ok, that i will try, so is it pointless me just leaving the server with the disks unmouted to see if it crashes like this ?

     

    I do actually want to upgrade the board to one with two .M2 slots...any recommendations ?

  13. I have been running on this equipment, mobo etc without issue for a good few years, this all started about 18 months ago, i use the server quite a bit so trying to get to bottom of it is frustrating, i am at the point of throwing money at it now

     

  14. On 4/25/2023 at 3:45 PM, JorgeB said:

    Did you do the Ryzen specific settings linked above? Didn't see a reply about that, if you did and issues persist board/CPU would be the main suspects.

    Wow..have i been bumbling along like this for a year !

     

    My server is still crashing constantly, daily if not left in safe mode, in safe mode i might get longer between crashes, parity usualy hangs at a certain point pretty much all the time, i have checked the BIOS for the C states but as far as i can see they are not enabled, couldnt really find the exact field to look as described in the thread for it, as you can see C states were off and had been anyway, restore AC power loss was set to power off.G47KKFl.png

     

    Now some of the drives have errors, i have spare drives to replace them and i actually want to change my parity drive to a 10TB disk, however, i am not sure if it wise to do this whilst its crashing like this ?, if it crashes whilst replacing parity i think i might  be up the creek without a paddle, its been said it is 'probably' hardware issue, i had done memtest and looked at all the other suggestions, would the hard drives cause this crashing ? should i boot the server to get GUI but NOT mount the disks and leave for a few days, if the disks were causing the issue then not being mounted would they be out of play for the cause ? i am even looking to see if replacing the motherboard will resolve the issue but where do start with these kind of problems, the power supply was replaced very recently when it was found to be dropping out and causing disk error, the new psu resolved this.

     

    The attached syslof is the server booted without the drives mounted.

    warptower-diagnostics-20240506-1308.zip

  15. 55 minutes ago, JorgeB said:

    Unraid driver is crashing, this usually is a hardware problem or a kernel compatibility issue, try updating to v6.11.5 or v6.12-rc2 and if the issue persists it's likely hardware related.

    Thats going to be a headache if it is hardware related... where do you even start looking, i suppose i could remove the memory and just start with one stick and see if it persists, adding a stick at a time. Would be good if it was the mobo because i am looking at upgrading that to one with two .m2 slots

     

  16. On 1/14/2023 at 1:55 PM, trurl said:

    Did the crash occur between these timestamps?

    Jan 13 16:42:12 Warptower emhttpd: read SMART /dev/sdc
    Jan 14 12:45:32 Warptower kernel: Linux version 5.15.46-Unraid (root@Develop) (gcc (GCC) 11.2.0, GNU ld version 2.37-slack15) #1 SMP Fri Jun 10 11:08:41 PDT 2022
    

     

    dump starts here but that is several hours earlier

    Jan 13 09:03:52 Warptower kernel: general protection fault, probably for non-canonical address 0x30000000000020: 0000 [#1] SMP NOPTI
    

     

    Looks like you only completed one memtest pass in your earlier screenshot. 

     

    Also, didn't see anybody mention this

    Sorry for the delay in responding, i have been using the server heavily so i have been able to afford the downtime. No one mentioned the above, however i have been running on this hardware for over four years and this started happening less than a year ago, maybe 6 months ago, i was advised to update the BIOS which i did. The last three crashes i have grabbed a diags, the crashes are so very random, the last one i think took over a week, sometimes it will happen same day, i can only operate in safe mode, if it boots normally it will happen in lest than half an hour and is more consistantly crashing.

     

    warptower-diagnostics-20230318-1326.zip warptower-diagnostics-20230321-1744.zip warptower-diagnostics-20230403-1235.zip

  17. 1 hour ago, mathomas3 said:

    I would warn you on the temp of those drives! They will die a lot faster

    im still trying to find fan regulators for the 3 fans that face the drives, its like sitting next to a jet engine if they are just plugged in without any speed control

×
×
  • Create New...