Continuous crashes.. help (SOLVED)

Helmonder · February 19, 2020

System just crashed again...

System totally unavailable, no shares, no web gui, no telnet. I can connect with IPMI , herewith what I see in the screen (see capture).

Thru the console I was able to copy syslog to /boot. Herewith attached.

I then gave a shutdown -r now.

After the gracefull shutdown time expires the system forces a shutdown, this also does not succeed and the system "hangs" for several minutes. At what time I use the physical off switch to have the system reboot.

I will reboot again... This has been going on for several weeks... Sometimes it keeps running for a few days, sometimes a week.. And then again a crash...

System has been stable for -years-.. This has been happening since the last two versions (I am running the most current at the moment. It happens in and out of safe mode, no difference. To be honoust I do not know if I was running in safe mode this last crash, I might not have been, I now am again.

System has booted up and is running again.

syslog

Edited February 27, 2020 by Helmonder

Helmonder · February 19, 2020

I am seeing some BTFS errors.... Running a scrub ... I see the following in the console:

BTRFS errors have not always been there though, I expect this to be more of a result of the constant crashes then a cause..

I think I will need to run a fsck..

Did a dev stats before:

Doing a --readonly first:

[1/7] checking root items
[2/7] checking extents
incorrect offsets 6286 3566737550
incorrect offsets 6286 3566737550
incorrect offsets 6286 3566737550
bad block 368558080
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
[4/7] checking fs roots
incorrect offsets 6286 3566737550
root 5 inode 28307375 errors 500, file extent discount, nbytes wrong
Found file extent holes:
	start: 0, len: 4096
root 5 inode 28307406 errors 2001, no inode item, link count wrong
	unresolved ref dir 85889 index 23729 namelen 32 name dcb5f3abeccef1ddf67afb89f3041dee filetype 1 errors 4, no inode ref
root 5 inode 28307407 errors 2001, no inode item, link count wrong
	unresolved ref dir 85889 index 23730 namelen 32 name a1ed8f397213b5bf29bcd65f83c83487 filetype 1 errors 4, no inode ref
root 5 inode 28307408 errors 2001, no inode item, link count wrong
	unresolved ref dir 85889 index 23731 namelen 36 name a1ed8f397213b5bf29bcd65f83c83487.txt filetype 1 errors 4, no inode ref
root 5 inode 28307409 errors 2001, no inode item, link count wrong
	unresolved ref dir 85889 index 23732 namelen 32 name 655a811d730b9a00c3603353a409b145 filetype 1 errors 4, no inode ref
root 5 inode 28307410 errors 2001, no inode item, link count wrong
	unresolved ref dir 85889 index 23733 namelen 32 name 212a89a0586ff984eafa34bdfdca96dd filetype 1 errors 4, no inode ref
root 5 inode 28307411 errors 2001, no inode item, link count wrong
	unresolved ref dir 85889 index 23734 namelen 36 name 212a89a0586ff984eafa34bdfdca96dd.txt filetype 1 errors 4, no inode ref
root 5 inode 28307412 errors 2001, no inode item, link count wrong

And a looooooooooooooooooooooooooooooot more lines of the same...

Running without --readonly and with --repair now..

Following is the result... Does not seem that it can fix it:

Do not use --repair unless you are advised to do so by a developer
	or an experienced user, and then only after having accepted that no
	fsck can successfully repair all types of filesystem corruption. Eg.
	some software or hardware bugs can fatally damage a volume.
	The operation will start in 10 seconds.
	Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
incorrect offsets 6286 3566737550
incorrect offsets 6286 3566737550
incorrect offsets 6286 3566737550
items overlap, can't fix
check/main.c:4333: fix_item_offset: BUG_ON `ret` triggered, value -5
/sbin/btrfs[0x42f1fd]
/sbin/btrfs[0x438339]
/sbin/btrfs[0x4387f0]
/sbin/btrfs[0x43937c]
/sbin/btrfs[0x43d255]
/sbin/btrfs(main+0x90)[0x40ecc0]
/lib64/libc.so.6(__libc_start_main+0xeb)[0x14f743d22e5b]
/sbin/btrfs(_start+0x2a)[0x40ef4a]

Repeating the --repair cycle for 3 times gives the exact same result..

Please advise ?

Am now attempting to copy the complete contents of the cache drive to a folder in the array... With the intention to reformat the cache drive and copy back the data after that (doing with dockers and vm disabled so the array should be quiet...).. Errors in the log are rolling though... The same BTRFS errors.. Hoping the copy will work without a crash...

I still think the BTRFS issue is not causing the crash, so should work... I will also try and change the FS type of the cachedrive to XFS.. I am not using a cache pool, just one 1TB M2 SSD... Without using a pool I do not quite see the advantage of BTRFS... There is not a lot of info on file system fixes out there..

Edited February 19, 2020 by Helmonder

JorgeB · February 19, 2020

1 hour ago, Helmonder said:

Do not use --repair unless you are advised to do so by a developer

--repair can only fix a small number of issues, and sometimes it makes things even worse, you should backup any data on cache and re-format the pool, there are some recovery options in the FAQ if needed.

Helmonder · February 19, 2020

I am trying to find the best solution by searching in the forum.. This was the best I could find.

Helmonder · February 20, 2020

Copied over what I could from the cache drive (my god what does plex have a LOT of files and folders).. I have now formatted the cache drive (with XFS) and am now copying back everything...

Helmonder · February 21, 2020

Well.. Plex could not be saved.. am reinstalling it.. The rest is up again. Without btrfs..

Helmonder · February 27, 2020

Issue appears to be related to a bug in the system that causes specific docker ip assignments to be unstable. More explenation in the following thread: https://forums.unraid.net/topic/89038-fatal-system-crash/?tab=comments#comment-826221

Continuous crashes.. help (SOLVED)

Recommended Posts

Helmonder

Link to comment

Helmonder

Link to comment

JorgeB

Link to comment

Helmonder

Link to comment

Helmonder

Link to comment

Helmonder

Link to comment

Helmonder

Link to comment

Join the conversation