rc8a: mover causes shfs segfault and kernel oops


Recommended Posts

Recently freed up a disk so I could use it as a cache drive, but every time the mover runs it takes my system down with Kernel oops messages in syslog.

 

I saw this thread: http://lime-technology.com/forum/index.php?topic=23406.0 but I'm getting different behaviour.

 

Here's a syslog from me rebooting the system (it was a hard reboot), starting the array (it forces a parity check because of the hard reboot), and running the mover script (while parity check is running, though that shouldn't matter, right?). Looks like shfs segfaults and then the kernel oopses itself into oblivion shortly after this (normally as soon as another process tries to talk to the array).

 

I've put it on Gist because it exceeded the post length limit here: https://gist.github.com/raw/3e6e3784880779c04561/08fffa463adbee6a8663bdc2f23a167a48aa8990/gistfile1.txt

 

A little further digging shows that the /mnt/user0 becomes completely unusable after running mover. Here's a before:

root@seraphim:~# ls -l /mnt
total 0
drwxrwx---  9 nobody users 272 2012-12-14 00:33 cache/
drwxrwx--- 12 nobody users 360 2012-12-14 00:33 disk1/
drwxrwx---  7 nobody users 152 2012-09-11 05:55 disk2/
drwxrwx---  7 nobody users 232 2012-11-25 20:02 disk3/
drwxrwx---  7 nobody users 216 2012-09-13 18:05 disk4/
drwxrwx---  1 nobody users 272 2012-12-14 00:33 user/
drwxrwx---  1 nobody users 360 2012-12-14 00:33 user0/

 

and after running mover:

 

root@seraphim:~# ls -l /mnt
/bin/ls: cannot access /mnt/user0: Transport endpoint is not connected
total 0
drwxrwx---  9 nobody users 272 2012-12-14 00:33 cache/
drwxrwx--- 12 nobody users 360 2012-12-14 00:33 disk1/
drwxrwx---  7 nobody users 152 2012-09-11 05:55 disk2/
drwxrwx---  7 nobody users 232 2012-11-25 20:02 disk3/
drwxrwx---  7 nobody users 216 2012-09-13 18:05 disk4/
drwxrwx---  1 nobody users 272 2012-12-14 00:33 user/
d???  ? ?      ?       ?                ? user0/

Link to comment

Where was the cache disk from?  When you assigned the drive to 'cache' and started array, did it first appear 'unformatted'?

 

It was originally a data disk. I copied all its data to another drive, took a screenshot of the drive list, then set the array up again by renaming super.dat and rebooting. I don't remember needing to format it (as it was already reiserfs formatted due to having been a data disk before).

Link to comment

Where was the cache disk from?  When you assigned the drive to 'cache' and started array, did it first appear 'unformatted'?

 

It was originally a data disk. I copied all its data to another drive, took a screenshot of the drive list, then set the array up again by renaming super.dat and rebooting. I don't remember needing to format it (as it was already reiserfs formatted due to having been a data disk before).

 

Try a file system check on your cache drive:

 

1. Start array in "Maintenance Mode"

 

2. Note which device is assigned to cache drive, say for sake of this example it's (sde), and type this command:

 

reiserfsck /dev/sde1

 

Answer 'Yes' to the prompt and let it run.

Link to comment

Looks like it's okay...

 

root@seraphim:~# reiserfsck /dev/hda1
reiserfsck 3.6.21 (2009 www.namesys.com)

*************************************************************
** If you are using the latest reiserfsprogs and  it fails **
** please  email bug reports to [email protected], **
** providing  as  much  information  as  possible --  your **
** hardware,  kernel,  patches,  settings,  all reiserfsck **
** messages  (including version),  the reiserfsck logfile, **
** check  the  syslog file  for  any  related information. **
** If you would like advice on using this program, support **
** is available  for $25 at  www.namesys.com/support.html. **
*************************************************************

Will read-only check consistency of the filesystem on /dev/hda1
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
###########
reiserfsck --check started at Sat Dec 22 10:43:34 2012
###########
Replaying journal: Done.
Reiserfs journal '/dev/hda1' in blocks [18..8211]: 0 transactions replayed
Checking internal tree.. finished
Comparing bitmaps..finished
Checking Semantic tree:
finished                  
No corruptions found
There are on the filesystem:
Leaves 1
Internal nodes 0
Directories 3
Other files 0
Data block pointers 0 (0 of them are zero)
Safe links 0
###########
reiserfsck finished at Sat Dec 22 10:43:51 2012
###########

Link to comment

Looks like it's okay...

 

root@seraphim:~# reiserfsck /dev/hda1
reiserfsck 3.6.21 (2009 www.namesys.com)

*************************************************************
** If you are using the latest reiserfsprogs and  it fails **
** please  email bug reports to [email protected], **
** providing  as  much  information  as  possible --  your **
** hardware,  kernel,  patches,  settings,  all reiserfsck **
** messages  (including version),  the reiserfsck logfile, **
** check  the  syslog file  for  any  related information. **
** If you would like advice on using this program, support **
** is available  for $25 at  www.namesys.com/support.html. **
*************************************************************

Will read-only check consistency of the filesystem on /dev/hda1
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
###########
reiserfsck --check started at Sat Dec 22 10:43:34 2012
###########
Replaying journal: Done.
Reiserfs journal '/dev/hda1' in blocks [18..8211]: 0 transactions replayed
Checking internal tree.. finished
Comparing bitmaps..finished
Checking Semantic tree:
finished                  
No corruptions found
There are on the filesystem:
Leaves 1
Internal nodes 0
Directories 3
Other files 0
Data block pointers 0 (0 of them are zero)
Safe links 0
###########
reiserfsck finished at Sat Dec 22 10:43:51 2012
###########

 

Check all of the drives.

Link to comment
  • 2 months later...

Unfortunately not. :(

 

I ended up making a whole bunch of big unrelated changes (new drives, moved unRAID inside VMware with hardware drives mapped, etc), and just left the cache drive turned off afterwards. Didn't persue this as my setup had changed quite a bit since I'd made this post.

 

Have you tried booting into maintenance mode and fscking all your drives, like I was told to?

Link to comment

I've already run it on disk1 and the cache disk. I had errors on the cache disk, which have since been fixed, but the problem persists. I can try doing it on the other disks, but I'm pretty sure the issue exists between these two drives.

 

Another attempt I made to fix this was to define minimum free space on each share, and to delete all drive names from the "included/not included" parts of the settings. This still didn't work.

 

Every time the mover runs, this error happens, and my system locks up and requires a hard restart. This is not ideal.

 

I want to use the cache disk for the purpose of installing plugins, but would be happy to disable it from "cache" usage. Unfortunately my attempts to do this are failing, as I've told each Share not to use the cache.. and yet it continues to do so.  >:(

 

EDIT: Looks like it actually deleted the files it was trying to move. Great!

Link to comment

Yeah that sounds exactly like the problem I was having. Your logs look very similar to what I was seeing as well.

 

Do you have any non-cache disk slots free? Could be much simpler to add it to a standard diskX slot and then set your all your shares to exclude that disk?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.