Jump to content

Continuous crashes.. help (SOLVED)


Recommended Posts

System just crashed again... 

 

System totally unavailable, no shares, no web gui, no telnet. I can connect with IPMI , herewith what I see in the screen (see capture).

 

Thru the console I was able to copy syslog to /boot. Herewith attached.

 

I then gave a shutdown -r now. 

After the gracefull shutdown time expires the system forces a shutdown, this also does not succeed and the system "hangs" for several minutes. At what time I use the physical off switch to have the system reboot.

 

I will reboot again... This has been going on for several weeks... Sometimes it keeps running for a few days, sometimes a week.. And then again a crash...

 

System has been stable for -years-.. This has been happening since the last two versions (I am running the most current at the moment. It happens in and out of safe mode, no difference. To be honoust I do not know if I was running in safe mode this last crash, I might not have been, I now am again.

 

System has booted up and is running again.

 

 

 

 

 

 

Capture.JPG

syslog

Edited by Helmonder
Link to comment

I am seeing some BTFS errors.... Running a scrub ... I see the following in the console:

 

BTRFS errors have not always been there though, I expect this to be more of a result of the constant crashes then a cause..

Capture2.JPG

 

I think I will need to run a fsck.. 

 

Did a dev stats before:

 

Capture3.JPG

 

Doing a --readonly first:

 

[1/7] checking root items
[2/7] checking extents
incorrect offsets 6286 3566737550
incorrect offsets 6286 3566737550
incorrect offsets 6286 3566737550
bad block 368558080
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
[4/7] checking fs roots
incorrect offsets 6286 3566737550
root 5 inode 28307375 errors 500, file extent discount, nbytes wrong
Found file extent holes:
	start: 0, len: 4096
root 5 inode 28307406 errors 2001, no inode item, link count wrong
	unresolved ref dir 85889 index 23729 namelen 32 name dcb5f3abeccef1ddf67afb89f3041dee filetype 1 errors 4, no inode ref
root 5 inode 28307407 errors 2001, no inode item, link count wrong
	unresolved ref dir 85889 index 23730 namelen 32 name a1ed8f397213b5bf29bcd65f83c83487 filetype 1 errors 4, no inode ref
root 5 inode 28307408 errors 2001, no inode item, link count wrong
	unresolved ref dir 85889 index 23731 namelen 36 name a1ed8f397213b5bf29bcd65f83c83487.txt filetype 1 errors 4, no inode ref
root 5 inode 28307409 errors 2001, no inode item, link count wrong
	unresolved ref dir 85889 index 23732 namelen 32 name 655a811d730b9a00c3603353a409b145 filetype 1 errors 4, no inode ref
root 5 inode 28307410 errors 2001, no inode item, link count wrong
	unresolved ref dir 85889 index 23733 namelen 32 name 212a89a0586ff984eafa34bdfdca96dd filetype 1 errors 4, no inode ref
root 5 inode 28307411 errors 2001, no inode item, link count wrong
	unresolved ref dir 85889 index 23734 namelen 36 name 212a89a0586ff984eafa34bdfdca96dd.txt filetype 1 errors 4, no inode ref
root 5 inode 28307412 errors 2001, no inode item, link count wrong

And a looooooooooooooooooooooooooooooot more lines of the same...

 

Running without --readonly and with --repair now..

 

Following is the result... Does not seem that it can fix it:

 

Do not use --repair unless you are advised to do so by a developer
	or an experienced user, and then only after having accepted that no
	fsck can successfully repair all types of filesystem corruption. Eg.
	some software or hardware bugs can fatally damage a volume.
	The operation will start in 10 seconds.
	Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
incorrect offsets 6286 3566737550
incorrect offsets 6286 3566737550
incorrect offsets 6286 3566737550
items overlap, can't fix
check/main.c:4333: fix_item_offset: BUG_ON `ret` triggered, value -5
/sbin/btrfs[0x42f1fd]
/sbin/btrfs[0x438339]
/sbin/btrfs[0x4387f0]
/sbin/btrfs[0x43937c]
/sbin/btrfs[0x43d255]
/sbin/btrfs(main+0x90)[0x40ecc0]
/lib64/libc.so.6(__libc_start_main+0xeb)[0x14f743d22e5b]
/sbin/btrfs(_start+0x2a)[0x40ef4a]

 

Repeating the --repair cycle for 3 times gives the exact same result..

 

Please advise ?

 

Am now attempting to copy the complete contents of the cache drive to a folder in the array... With the intention to reformat the cache drive and copy back the data after that (doing with dockers and vm disabled so the array should be quiet...).. Errors in the log are rolling though... The same BTRFS errors.. Hoping the copy will work without a crash...

 

I still think the BTRFS issue is not causing the crash, so should work... I will also try and change the FS type of the cachedrive to XFS.. I am not using a cache pool, just one 1TB M2 SSD... Without using a pool I do not quite see the advantage of BTRFS... There is not a lot of info on file system fixes out there..

 

 

Edited by Helmonder
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...