Unmountable Disk and stalling emhttp


37 posts in this topic Last Reply

Recommended Posts

I seen elsewhere not to reformat - so i have not done that

 

The disk that has become unmountable has been used an an NFS mount to a vSphere server for the past couple of years without issue. Over the last couple of days I have been experiencing some issues with my UNRAID box - it would lock up, no emhttp, no entries in the syslog and the box becomes unresponsive to even a poweroff command meaning a hard reset is the only thing to do.

 

After some trouleshooting i pinpointed that i think the vSphere connection might be the cause of the issues (though nothing has changed for months in that space)

 

I disabled auto boot and fired them up one at a time, everything seemed fine for a while, so i decided to be safe and take some snapshots of the boxes. Thats where things went bad.

The box begun to show a bunch of 

Mar 18 20:24:57 unRAID kernel: REISERFS error (device md1): vs-4080 _reiserfs_free_block: block 81155249: bit already cleared

and

Mar 18 20:36:36 unRAID shfs/user: err: shfs_open: open: /mnt/disk1/vSphere/KINGPC01/.lck-45ff0b0000000000 (30) Read-only file system

 

I decided to stop the array and try to restart it to remount as read/write. Now the drive is unmountable.

 

Could something be wrong with the drive that caused the issues described above (SMART Seems OK) and should i just replace it?

 

What steps do i take from here to restore the server to its previous glory?

Please provide as much detail as possible, I work in IT and I'm a Gun at windows admin. Linux is a whole different kettle of fish. I can understand if pointed in the right direction, but please dont assume i know what im doing by naming a command to run. (Kind of petrified of destroying my NAS :|)

unraid-diagnostics-20170318-2049.zip

unraid-syslog-20170318-2049.zip

Link to post

Thanks for the quick response. Ive started a reiserfsck --check /dev/md1

 

Reading the link you provided, if i get anything other that --fix-fixable as a return, what action should i take granted that this box hosts my VM's. Basically any file on that disk is important to the core system running correctly. is --fix-fixable even appropriate for a disk of this type?

Link to post

From what i was reading on other posts, thats likely the best outcome as even just replacing the disk has a risk of replacing the corruption?

 

Unfortunately it just returned the below

2 found corruptions can be fixed only when running with --rebuild-tree
 

Do you advise to just run this and hope for the best? What steps from there to identify loss?

Link to post

Rebuilding a disk won't fix file system corruption.

 

You need to use --rebuild-tree, usually reiserfsck is very good at repairing the filesystem without issues, in the end look for a lost+found folder, if there are files there they they may be corrupt.

Edited by johnnie.black
Link to post

Seriously, Thanks again for the help. I feel the return from that was quite bad. Hopefully you can shed some expert advice on it

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
Replaying journal: Done.
Reiserfs journal '/dev/md1' in blocks [18..8211]: 0 transactions replayed
###########
reiserfsck --rebuild-tree started at Sat Mar 18 21:28:40 2017
###########

Pass 0:
####### Pass 0 #######
Loading on-disk bitmap .. ok, 226344267 blocks marked used
Skipping 30567 blocks (super block, journal, bitmaps) 226313700 blocks will be read
0%block 2523084: The number of items (1531) is incorrect, should be (1) - corrected
block 2523084: The free space (65269) is incorrect, should be (4045) - corrected
pass0: vpf-10110: block 2523084, item (0): Unknown item type found [4094427913 185720068 0xfdf30df6 ??? (15)] - deleted
Segmentation fault

 

Link to post

There's apparently a problem with latest reiserfsprogs that some times can't complete a --rebuild-tree, I've seen it happen 2 or 3 times before, it was solved by downgrading to unRAID v6.2.4, it includes an earlier version, and then running --rebuild-tree again.

Link to post
4 minutes ago, kingy444 said:

Seriously, Thanks again for the help. I feel the return from that was quite bad. Hopefully you can shed some expert advice on it


Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
Replaying journal: Done.
Reiserfs journal '/dev/md1' in blocks [18..8211]: 0 transactions replayed
###########
reiserfsck --rebuild-tree started at Sat Mar 18 21:28:40 2017
###########

Pass 0:
####### Pass 0 #######
Loading on-disk bitmap .. ok, 226344267 blocks marked used
Skipping 30567 blocks (super block, journal, bitmaps) 226313700 blocks will be read
0%block 2523084: The number of items (1531) is incorrect, should be (1) - corrected
block 2523084: The free space (65269) is incorrect, should be (4045) - corrected
pass0: vpf-10110: block 2523084, item (0): Unknown item type found [4094427913 185720068 0xfdf30df6 ??? (15)] - deleted
Segmentation fault

 

 

Important, before rebooting grab and post new diagnostics, it may help LT identify the problem.

Edited by johnnie.black
Link to post

So after that ran all night it only put 3 files in lost+found. Pretty happy with that (Ignore perms below, this is a windows dump). My questions is, given the last write time I assume these to be from an old VM I haven't used for a while (and can safely delete them).

 

But if i did want to try to restore them to their location are there and 'tricks of the trade' to identify where they used to live?

    Directory: \\10.0.0.21\disk1\lost+found


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
------       10/12/2014   3:43 PM    66836967424 3778_4061
------        1/08/2015  10:09 PM             84 10256_31
------       26/08/2015   5:25 AM     4168192000 26_14345

Couple more questions:

  1. Should i take any more steps from here to complete the process? ie, format the drive or replace it and let parity do its job or should it be ok?
  2. Should i update to 6.3.2 again ? Assuming there is no reason not to.
  3. Do my issues with emhttp sound related to a corrupt drive? I can't see any out of memory errors to crash it
Link to post

ok. so that seems to be working ok. Ran parity for the sake of it too.

 

I started my first VM after that, then after about 5 minutes started getting the below and the VM locked up. Could you advise what could have changed to start unraid treating this as a read only share? As i said, havent changed anything in this space for ages. SysLog/Diagnostics attached

Mar 19 20:26:43 unRAID shfs/user: err: shfs_truncate: truncate: /mnt/disk1/vSphere/KINGPC01/KINGPC01-000001-delta.vmdk (5) Input/output error
Mar 19 20:26:43 unRAID kernel: REISERFS error (device md1): vs-4080 _reiserfs_free_block: block 391156233: bit already cleared
Mar 19 20:26:43 unRAID kernel: REISERFS (device md1): Remounting filesystem read-only
Mar 19 20:26:44 unRAID shfs/user: err: shfs_truncate: truncate: /mnt/disk1/vSphere/KINGPC01/KINGPC01-000001-delta.vmdk (30) Read-only file system
Mar 19 20:26:44 unRAID shfs/user: err: shfs_open: open: /mnt/disk1/vSphere/KINGPC01/KINGPC01-000001-delta.vmdk (30) Read-only file system
Mar 19 20:26:44 unRAID shfs/user: err: shfs_truncate: truncate: /mnt/disk1/vSphere/KINGPC01/KINGPC01-000001-delta.vmdk (30) Read-only file system
Mar 19 20:26:44 unRAID shfs/user: err: shfs_truncate: truncate: /mnt/disk1/vSphere/KINGPC01/KINGPC01-000001-delta.vmdk (30) Read-only file system

 

unraid-syslog-20170319-2037.zip

unraid-diagnostics-20170319-2039.zip

Link to post

There's file system problems again on disk1, you need to run reiserfsck one more time.

 

Since they are re-ocurring, once it's fixed again maybe best to move all data from that disk and format it, you can use XFS, it's the current preferred file system for data disks.

 

PS: VM would perform much better on your cache disk.

PPS: No need to post the syslog, it's included with the diagnostics.

 

 

Link to post

no worries. Well ill kick that process off shortly then. 

 

To be clear on my setup (re vms running better on cache). I currently run vSphere connected via NFS. I use disk1 for the purpose of the Parity it provides. What would you suggest if i were to move them to cache in order to keep them backed up?

Link to post

just to clarify, that would mean that by having multiple cache drives, the vm's would be stored on both cache drives (RAID1) and if i lost the cache drive i would have them on the second disk as a backup?

 

EDIT: --check just finished. No errors returned this time

Edited by kingy444
update
Link to post
2 minutes ago, kingy444 said:

just to clarify, that would mean that by having multiple cache drives, the vm's would be stored on both cache drives (RAID1) and if i lost the cache drive i would have them on the second disk as a backup?

 

Yes

 

3 minutes ago, kingy444 said:

EDIT: --check just finished. No errors returned this time

 

There are clearly still issues, I would proceed by moving all data to other disk(s) and format that one with XFS.

Link to post

Started that, likely take me a couple of days unfortunately. 

 

Quick question. Obviously SSD is going to be the quickest option but that will set me back some $$$. 

 

Currently i have a WD Black 7200RPM drive as a cache, duplicating that is much cheaper and i haven't had too much of an issue thus far with write speeds. What are your thoughts?

Link to post

If you are happy with current cache speed I'd say use another Black, raid1 cache pool will still have faster writes than the array, so the VM will also have better writing speed, reads will be similar to what they were.

Link to post

So the move of the VMs has gone much quicker than planned, with one exception

 

i have hit a snag moving two of them. Basically using mv -v src dest and thats been fine, but when moving two particular VMs the command doesnt seem to go anywhere. once it reaches the flat vmdk file it looks like its taking its time to copy, so i let it do its thing. check back a couple hours later and its at the same spot but the disks are now spun down. nothing i can see in syslog.

 

any pointers where to go from here? Cant be the file size as other VMs were same size or bigger.

Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.