kernel panic after 6.2 upgrade


tazman

Recommended Posts

Hi,

 

I am having massive problems with 6.2. See my boiler plate for my config.

 

After the upgrade from 6.1.9 everything seems to be running well and I added a second parity drive which was built without problems.

 

Then I noticed that the system became unresponsive on shares and then the webgui. Eventually I had to take it down with poweroff as nothing else worked. Note that, out of habit, I tried powerdown first and that the powerdown script from 6.1.9 was still installed.

 

After a restart everything first looked normal in the log but the array could not be started. I would sit on "Mounting disks..." and then messages about xfs errors, tainted mounts  and eventually kernel traces and kernal panics appeared. Culminating with (not in the attached log):

Kernel panic - not syncing: Fatal exception in interrupt

Shutting down cpus with NMI

Kernel Offset: disabled

 

I reverted to 6.1.9. A memtest didnt show any errors. I was able to start the array. No errors occurred.

 

Back on 6.1.9 xfs_repair -v (and -vn) reported the following error on ALL of my 14 data drives: bad primary superblock - bad magic number !!!

 

I interrupted "attempting to find secondary superblock..." on a 3TB drive after it has been running without any results for 2 hours.

 

The drive sdc1 which shows up with the "bad dir block magic!" error message in the log file is the second parity drive - at least after I downgraded which may have changed the drive numbering.

 

I am at a loss now with what to do as I dont want to do something that makes things worse or even unrecoverable.

 

I welcome your advice on what to do next.

 

Thanks,

 

Thomas

syslog.txt

Link to comment

Thanks johnnie.black.

 

I rebooted 6.2 and performed xfs_repair against all 14 md devices. When I use sd* I still get the same error message as before but I assume that this was wrong in the first place and that these devices should have been sd*1 in the first place which produces the same result as md*.

 

None of the checks returned any errors and I am really wondering what was wrong in the first place.

 

I will continue monitoring.

Link to comment

Ok, good to know. I read that maintenance mode is needed for that as well.

 

Anyhow. I have been wondering for a while: if you suspect that your device is faulty. And you run corrections on it.... doesn't "maintaining parity" make the parity data faulty as well?

 

If something is really wrong with a disk, wouldn't it make more sense not to touch the parity disk at all assuming that it maintains still better/more accurate data that could be used to rebuild the fault disk from scratch if needed?

Link to comment

Anyhow. I have been wondering for a while: if you suspect that your device is faulty. And you run corrections on it.... doesn't "maintaining parity" make the parity data faulty as well?

If the disk already corresponds to parity then you will want it to keep on doing so.

 

If something is really wrong with a disk, wouldn't it make more sense not to touch the parity disk at all assuming that it maintains still better/more accurate data that could be used to rebuild the fault disk from scratch if needed?

This is true if a disk has really failed at the physical level, but in such a case you will be able to run a repair of the physical disk in the first place.    However if it is unmountable due file system corruption then that corruption is almost certainly already reflected in parity.  That is why trying to do a rebuild of a disk marked as unmountable it is still shown as unmountable at the end of the rebuild process and still needs a repair tool to fix the file system level corruption.  By running in maintenance mode when doing the repair you ensure that any changes made to the physical disk are reflected in parity.
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.