Jump to content

Unriad crashed unexpectedly


Go to solution Solved by JorgeB,

Recommended Posts

ever since i upgrade to 6.12 on wards i have been experiencing random crashes where everything is completely unassessable

 

when it crashes there is no display on a montior and also no access via url or putty, it will ping but that's about all it will do which this then results in me having to force a power down, last time i done this the cache drive got corpted but i was able to run some command and find out what folder was the problem but for some reason this time i am not able to do so

can someone please review the daigs logs and sys logs to provide input to what has caused this ?

 

current unraid version is 6.21.2 i can try upgrade to the next version but need to find out where the corrpted file is on my cache so i can do this so if anyone is able to work this out or provide a command to locate this corpted files that would be great

 

this is the check cache spript i created and info it provided FYI  the below is the first run of the spript after boot up there scan is done hourly so didn't want to upload it since its the same input each time

 

Script Starting Oct 11, 2023  22:47.01

Full logs for this script are available at /tmp/user.scripts/tmpScripts/checkcachedrives/log.txt

[/dev/sde1].write_io_errs    0
[/dev/sde1].read_io_errs     0
[/dev/sde1].flush_io_errs    0
[/dev/sde1].corruption_errs  0
[/dev/sde1].generation_errs  0
[/dev/sdg1].write_io_errs    0
[/dev/sdg1].read_io_errs     0
[/dev/sdg1].flush_io_errs    0
[/dev/sdg1].corruption_errs  16
[/dev/sdg1].generation_errs  0
Script Finished Oct 11, 2023  22:47.04

 

I have just disabled docker so i can run mover to remove everything from my cache drives cause last time it left the corpted files on cache

 

please any assistance would be great this random crash is very random sometimes it can run for days some times weeks and then month or to before a crash

vault101-diagnostics-20231013-0905.zip vault101-syslog-20231013-0804.zip

Edited by LoyalScotsman
Link to comment
7 minutes ago, itimpi said:

If you try to copy the files off the cache then since the file system is BTRFS which automatically checksums files the ones that are corrupt will refuse to copy without error.

thank you this is probley why it didn't move the last corpted files, mover has just completed and by the looks of it the only files left is a some appdata files for drop box, the last time it corpted it was my plex DB which i managed to save so followed your sprit guide and now run a backup of this daily

Edited by LoyalScotsman
Link to comment
On 10/13/2023 at 10:44 AM, JorgeB said:

You can run a correcting scrub, it should list any corrupt files, but since it's a mirror they migh get fixed.

I don't this but it wouldn't fix them so moved everything off and formatted cache then copied bakc over 

 

So not sure if same issue but issue contuine see below 

 

Plexed started giving errors when trying to watch something so I rebooted plex docker to then not work at all, then I shut plex down and started it up again no change my server was down, turned docker off then back on and then said docker failed to start then stopped array for reboot and it got stuck at unmounting disk so found this forum 

ran commands on it but no luck see screenshot so I rebooted to see if this would bring docker back up but now I have the following on my cache disks they are showing as incorrect format but some dockers are working but not all. 

 

I have attached below daigs and syslogs before I rebooted see below will create another comment with fresh daigs and logs after reboot please help me on this one I have not yet formatted the cahce disk awaiting your responce 

Screenshot_20231015_201149_Chrome.jpg

Screenshot_20231015_202443_Chrome.jpg

vault101-syslog-20231015-1913.zip vault101-diagnostics-20231015-2013.zip

Link to comment
1 hour ago, JorgeB said:

If the log tree is the only issue this may help:

 

btrfs rescue zero-log /dev/sdf1

 

Then re-start array, also change docker network to ipvlan.

ideal this has fixed my cache your a life saver :D

so next question with regards to the  changing docker network to IPvlan so its currently on MAClan which you are clearly aware of the reason i ask about this is cause i have never adjusted this and always been maclan

but what implication will this make will there be any reconfigurion needed  for anything within docker or is it a switch and done ?

also noticed i got the pop up for unraid V6.12.4 is it worth upgrading cause it got stuck on unmounting again and i had to run the commands losetup, umount to actually get the array to stop but going by above forum that should been fixed on the version of unraid i have installed

Edited by LoyalScotsman
Link to comment

hello

 

sorry to be a pain but i just had the crash where it completely locks out everything wont ping cant access the URL no display when monitor connected.

I checked my router and it was shown the statis IP is not connected even though the NIC lights are flashing on the back of the server, so completely crashed from what it looks like i have just forced a shutdown with power button "now doing a parity check" and attached the logs and daigs from this fresh boot up hopefully you can find something to why this keeps happening

vault101-diagnostics-20231016-1128.zip vault101-syslog-20231016-1028.zip

 

 

also my cache disks have an error on them again see below

 

Script Starting Oct 16, 2023 11:50.35

Full logs for this script are available at /tmp/user.scripts/tmpScripts/checkcachedrives/log.txt

[/dev/sdf1].write_io_errs 0
[/dev/sdf1].read_io_errs 0
[/dev/sdf1].flush_io_errs 0
[/dev/sdf1].corruption_errs 0
[/dev/sdf1].generation_errs 0
[/dev/sdh1].write_io_errs 16647
[/dev/sdh1].read_io_errs 204
[/dev/sdh1].flush_io_errs 0
[/dev/sdh1].corruption_errs 0
[/dev/sdh1].generation_errs 0
Script Finished Oct 16, 2023 11:50.37

just started a scrub with repair corrupted blocks to see if does anything
 

Edited by LoyalScotsman
Link to comment
17 minutes ago, JorgeB said:

Crashes may be related to macvlan, change to ipvlan or see the 6.12.4 releases note for an alternatively way to keep using macvlan.

 

These suggest this device dropped offline at some point, see here for more info.

ok i will implement the change from MACvlan to IPvlan to test this fingers crossed this fixes it

i have also checked that link and ran the command to clear the errors which didn't help, had to power down and reset the cache disks, so going to look into the cable that's on cache disks just incase

Link to comment
  • 4 weeks later...
On 10/16/2023 at 12:06 PM, JorgeB said:

Crashes may be related to macvlan, change to ipvlan or see the 6.12.4 releases note for an alternatively way to keep using macvlan.

 

These suggest this device dropped offline at some point, see here for more info.

Hello

 

just thought i would give an update uptime is now 28 days so far seem more stable with the macvlan change, but will contuine to montior it but so far so good

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...