unraid crashing during data rebuild


Recommended Posts

i recently decided to upgrade one of my drives (a 4tb one) with an older drive (8tb) i had replaced a while ago because it was getting read errors, turned out those read errors were just a bad cable and the drive was fine. and it sat on my shelf for a while,so i plugged it in, started preclearing it and then when i got time i shut the system down and physically swapped the drives. the system had been up for a little over a month before this, i followed the normal procedure of turning docker and vm services off and letting unraid do the data rebuild without anything accessing the array. however now it crashes during the rebuild, sometimes its right away and other times it gets almost done before crashing. ive attached my diagnostics from the last restart as of last night, and i have already followed the advice for Ryzen systems with c-states and memory, my memory is fine i had a previous problem with random crashing that ended up being my psu, and already ran a memtest for 24 hrs with no problems.

the drive i replaced is the WD 4tb drive, and replaced it with an 8tb Seagate so all my data drives are 8tb now, both drives are attached, and my last hope now is to forget about changing that drive and go back to the 4tb and see if unraid stops crashing. any help would be great, although for the next 3 days, but i will try what i can or wait tilli get home to try anything physical with the system

otfgserver-diagnostics-20220627-2138.zip

Link to comment

i just got back in town last night, updated unraid and let it rebuild again. i apparently had syslog mirroring already on so when i got up this morning and it had crashed again i did check it, but it didnt show anything more than normal operating messages the last hour before it crashed were just "not running mover due to parity check/rebuild" and stuff like that. i did take a picture of my monitor hooked up to the server so here's the crash log it was displaying. hope this helps some, if not i will upload the syslog and the message on screen next time it crashes

20220701_080529.jpg

Link to comment

I've looked at my syslog quite a lot and i will say again there wasn't anything in it that indicated a start of a crash or any errors, just messages every hour or so saying its not running mover due to parity check/rebuild. i also cleared the syslog as it apparently had been continuing to log for a few weeks without being cleared. i also have another script that's supposed to save each syslog from boot to shutdown/crash to a separate file i still have that one, but like i said it doesn't appear to have any info, it also only starts once the gui starts (at least until i realized and fixed that by adding it to the go file last night). i will upload the current one aswell as it includes boot

file ending with 415 is from yesterday from gui load to crash, and 595 is todays boot up

syslog-1656641415 syslog-1656684595

Link to comment

i thought that was weird too. this is the first time I've noticed that it hasn't logged the traces. when it crashes again (if it crashes again...) illl grab the current syslog and see if it has any insight

 

*edit* the only thing i can see different is beside the status in the bottom corner is it says "starting libvertwol....." even though ive got vm's turned off

Edited by me160
Link to comment

update. it crashed again, and again there's nothing in the syslog. ill upload anyway, along with a picture of the console log...unfortunately the top of the log is cut off witch is a shame as it probably has what caused the crash in it, but the console wont scroll up so here's what I've got hope this can help a bit more?

20220701_112502.jpg

syslog

Link to comment

i was gona do that anyway, but was gona wait to see if there was something i could do within unraid, but ill do a memtest now and report back when it finishes

ps. i ran a memtest about 2 months ago and it came out clean, so i think this one will too but who can tell till its done right?

Link to comment

ok got an update here, memtest passed with 0 errors so i booted back to unraid and tried a rebuild again, this time it froze rather than crashing.....not shure if that bettor or worse? the log im uploading is the one from that freeze, it looks like it recovered from it at around 11:00am, but when i checked the system at 4pm it was not responding. the webgui wouldn't load (got page not found in chrome) and the console weirdly didn't have any signs of a crash and was responding to keyboard input, however it didnt have the login command where it normally finishes booting then last line says "otfgserver login:" or something to that effect it was just a blinking cursor, and it did the same when i rebooted it

syslog-1656773310

Link to comment

ill be honest, in the 4 years ive had this install (and 3-4 hardware changes, all ryzen since the first) the only time it was stable for more than a month or 2 was with the first set of hardware, and that was an amd fx series cpu, as soon as i upgraded to ryzen ive had problems.....im thinking i might just make the investment and go intel, but i like the power efficiency of amd and ive noticed the new 12th gen intel still isnt 100% compatible with the current unraid kernel...and no i don't have anything overclocked on the system

Link to comment

I have an update, after i posted my last message here i decided id to a bios update on my motherboard, i was hesitant to do yhis as the last time i had to update a bios it ended up causing unraid to not boot at all and basically bricked my unraid usb drive (not entirely shure how that happend...) And i had to buy a new motherboard. Anyway, after i updated the bios i went through my bios settings because apparently on this board it doesnt always save bios settings when updating. And changed a few things there (boot order, enabling resizable bar, and enabling visualization witch i was shure i turned on before because i run vms, and disabling some of the secure/fastboot options for windows) got back into unraid and started a rebuild and low and behold it finished a rebuild this morning......im not quite shure how updating my bios changed anything as my understanding is it just handles device discovery and a sort of system self test before handing everything over to the OS.....but it seems to have solved my rebuild problem.......now does anyone know why the physical console monitor i have attached no longer has a login prompt? Its only been like that since i updated, did they remove that cli function in this version for some reason?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.