Unmountable Disk and stalling emhttp

kingy444 · March 18, 2017

I seen elsewhere not to reformat - so i have not done that

The disk that has become unmountable has been used an an NFS mount to a vSphere server for the past couple of years without issue. Over the last couple of days I have been experiencing some issues with my UNRAID box - it would lock up, no emhttp, no entries in the syslog and the box becomes unresponsive to even a poweroff command meaning a hard reset is the only thing to do.

After some trouleshooting i pinpointed that i think the vSphere connection might be the cause of the issues (though nothing has changed for months in that space)

I disabled auto boot and fired them up one at a time, everything seemed fine for a while, so i decided to be safe and take some snapshots of the boxes. Thats where things went bad.

The box begun to show a bunch of

Mar 18 20:24:57 unRAID kernel: REISERFS error (device md1): vs-4080 _reiserfs_free_block: block 81155249: bit already cleared

and

Mar 18 20:36:36 unRAID shfs/user: err: shfs_open: open: /mnt/disk1/vSphere/KINGPC01/.lck-45ff0b0000000000 (30) Read-only file system

I decided to stop the array and try to restart it to remount as read/write. Now the drive is unmountable.

Could something be wrong with the drive that caused the issues described above (SMART Seems OK) and should i just replace it?

What steps do i take from here to restore the server to its previous glory?

Please provide as much detail as possible, I work in IT and I'm a Gun at windows admin. Linux is a whole different kettle of fish. I can understand if pointed in the right direction, but please dont assume i know what im doing by naming a command to run. (Kind of petrified of destroying my NAS )

unraid-diagnostics-20170318-2049.zip

unraid-syslog-20170318-2049.zip

Edited July 3, 2022 by kingy444

JorgeB · March 18, 2017

You need to check file system on disk1 (md1):

https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Drives_formatted_with_ReiserFS_using_unRAID_v5_or_later

kingy444 · March 18, 2017

Thanks for the quick response. Ive started a reiserfsck --check /dev/md1

Reading the link you provided, if i get anything other that --fix-fixable as a return, what action should i take granted that this box hosts my VM's. Basically any file on that disk is important to the core system running correctly. is --fix-fixable even appropriate for a disk of this type?

JorgeB · March 18, 2017

Basically you'll need to do what reiserfsck tells you, --fix-fixable is the best result since it means little corruption.

kingy444 · March 18, 2017

From what i was reading on other posts, thats likely the best outcome as even just replacing the disk has a risk of replacing the corruption?

Unfortunately it just returned the below

2 found corruptions can be fixed only when running with --rebuild-tree

Do you advise to just run this and hope for the best? What steps from there to identify loss?

JorgeB · March 18, 2017

Rebuilding a disk won't fix file system corruption.

You need to use --rebuild-tree, usually reiserfsck is very good at repairing the filesystem without issues, in the end look for a lost+found folder, if there are files there they they may be corrupt.

Edited March 18, 2017 by johnnie.black

kingy444 · March 18, 2017

Seriously, Thanks again for the help. I feel the return from that was quite bad. Hopefully you can shed some expert advice on it

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
Replaying journal: Done.
Reiserfs journal '/dev/md1' in blocks [18..8211]: 0 transactions replayed
###########
reiserfsck --rebuild-tree started at Sat Mar 18 21:28:40 2017
###########

Pass 0:
####### Pass 0 #######
Loading on-disk bitmap .. ok, 226344267 blocks marked used
Skipping 30567 blocks (super block, journal, bitmaps) 226313700 blocks will be read
0%block 2523084: The number of items (1531) is incorrect, should be (1) - corrected
block 2523084: The free space (65269) is incorrect, should be (4045) - corrected
pass0: vpf-10110: block 2523084, item (0): Unknown item type found [4094427913 185720068 0xfdf30df6 ??? (15)] - deleted
Segmentation fault

JorgeB · March 18, 2017

There's apparently a problem with latest reiserfsprogs that some times can't complete a --rebuild-tree, I've seen it happen 2 or 3 times before, it was solved by downgrading to unRAID v6.2.4, it includes an earlier version, and then running --rebuild-tree again.

JorgeB · March 18, 2017

4 minutes ago, kingy444 said:

Seriously, Thanks again for the help. I feel the return from that was quite bad. Hopefully you can shed some expert advice on it


Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
Replaying journal: Done.
Reiserfs journal '/dev/md1' in blocks [18..8211]: 0 transactions replayed
###########
reiserfsck --rebuild-tree started at Sat Mar 18 21:28:40 2017
###########

Pass 0:
####### Pass 0 #######
Loading on-disk bitmap .. ok, 226344267 blocks marked used
Skipping 30567 blocks (super block, journal, bitmaps) 226313700 blocks will be read
0%block 2523084: The number of items (1531) is incorrect, should be (1) - corrected
block 2523084: The free space (65269) is incorrect, should be (4045) - corrected
pass0: vpf-10110: block 2523084, item (0): Unknown item type found [4094427913 185720068 0xfdf30df6 ??? (15)] - deleted
Segmentation fault

Important, before rebooting grab and post new diagnostics, it may help LT identify the problem.

Edited March 18, 2017 by johnnie.black

kingy444 · March 18, 2017

Diagnostics attached.

IS there a quick way to downgrade, or do i need to download the version and overwrite files manually? I can only see manual listed through Google (this update through the interface really made everything too easy and i haven't unplugged my usb for ages.)

unraid-diagnostics-20170318-2142.zip

Edited July 3, 2022 by kingy444

JorgeB · March 18, 2017

Download v6.2.4, overwrite the 3 bz* files using the flash share and reboot.

Edited March 18, 2017 by johnnie.black

kingy444 · March 18, 2017

So after that ran all night it only put 3 files in lost+found. Pretty happy with that (Ignore perms below, this is a windows dump). My questions is, given the last write time I assume these to be from an old VM I haven't used for a while (and can safely delete them).

But if i did want to try to restore them to their location are there and 'tricks of the trade' to identify where they used to live?

    Directory: \\10.0.0.21\disk1\lost+found


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
------       10/12/2014   3:43 PM    66836967424 3778_4061
------        1/08/2015  10:09 PM             84 10256_31
------       26/08/2015   5:25 AM     4168192000 26_14345

Couple more questions:

Should i take any more steps from here to complete the process? ie, format the drive or replace it and let parity do its job or should it be ok?
Should i update to 6.3.2 again ? Assuming there is no reason not to.
Do my issues with emhttp sound related to a corrupt drive? I can't see any out of memory errors to crash it

JorgeB · March 18, 2017

Disk should be OK now but you can run reiserfsck --check to confirm, if all is well upgrade back to v6.3.2 and see if emhttp issues are gone now.

kingy444 · March 19, 2017

ok. so that seems to be working ok. Ran parity for the sake of it too.

I started my first VM after that, then after about 5 minutes started getting the below and the VM locked up. Could you advise what could have changed to start unraid treating this as a read only share? As i said, havent changed anything in this space for ages. SysLog/Diagnostics attached

Mar 19 20:26:43 unRAID shfs/user: err: shfs_truncate: truncate: /mnt/disk1/vSphere/KINGPC01/KINGPC01-000001-delta.vmdk (5) Input/output error
Mar 19 20:26:43 unRAID kernel: REISERFS error (device md1): vs-4080 _reiserfs_free_block: block 391156233: bit already cleared
Mar 19 20:26:43 unRAID kernel: REISERFS (device md1): Remounting filesystem read-only
Mar 19 20:26:44 unRAID shfs/user: err: shfs_truncate: truncate: /mnt/disk1/vSphere/KINGPC01/KINGPC01-000001-delta.vmdk (30) Read-only file system
Mar 19 20:26:44 unRAID shfs/user: err: shfs_open: open: /mnt/disk1/vSphere/KINGPC01/KINGPC01-000001-delta.vmdk (30) Read-only file system
Mar 19 20:26:44 unRAID shfs/user: err: shfs_truncate: truncate: /mnt/disk1/vSphere/KINGPC01/KINGPC01-000001-delta.vmdk (30) Read-only file system
Mar 19 20:26:44 unRAID shfs/user: err: shfs_truncate: truncate: /mnt/disk1/vSphere/KINGPC01/KINGPC01-000001-delta.vmdk (30) Read-only file system

unraid-syslog-20170319-2037.zip

unraid-diagnostics-20170319-2039.zip

Edited July 3, 2022 by kingy444

JorgeB · March 19, 2017

There's file system problems again on disk1, you need to run reiserfsck one more time.

Since they are re-ocurring, once it's fixed again maybe best to move all data from that disk and format it, you can use XFS, it's the current preferred file system for data disks.

PS: VM would perform much better on your cache disk.

PPS: No need to post the syslog, it's included with the diagnostics.

kingy444 · March 19, 2017

no worries. Well ill kick that process off shortly then.

To be clear on my setup (re vms running better on cache). I currently run vSphere connected via NFS. I use disk1 for the purpose of the Parity it provides. What would you suggest if i were to move them to cache in order to keep them backed up?

JorgeB · March 19, 2017

You'd need to create a cache pool by changing current cache file system to btrfs and adding another disk, it would then be protected in RAID1.

kingy444 · March 19, 2017

just to clarify, that would mean that by having multiple cache drives, the vm's would be stored on both cache drives (RAID1) and if i lost the cache drive i would have them on the second disk as a backup?

EDIT: --check just finished. No errors returned this time

Edited March 19, 2017 by kingy444
update

JorgeB · March 19, 2017

2 minutes ago, kingy444 said:

just to clarify, that would mean that by having multiple cache drives, the vm's would be stored on both cache drives (RAID1) and if i lost the cache drive i would have them on the second disk as a backup?

Yes

3 minutes ago, kingy444 said:

EDIT: --check just finished. No errors returned this time

There are clearly still issues, I would proceed by moving all data to other disk(s) and format that one with XFS.

kingy444 · March 19, 2017

Started that, likely take me a couple of days unfortunately.

Quick question. Obviously SSD is going to be the quickest option but that will set me back some $$$.

Currently i have a WD Black 7200RPM drive as a cache, duplicating that is much cheaper and i haven't had too much of an issue thus far with write speeds. What are your thoughts?

JorgeB · March 19, 2017

If you are happy with current cache speed I'd say use another Black, raid1 cache pool will still have faster writes than the array, so the VM will also have better writing speed, reads will be similar to what they were.

kingy444 · March 20, 2017

So the move of the VMs has gone much quicker than planned, with one exception

i have hit a snag moving two of them. Basically using mv -v src dest and thats been fine, but when moving two particular VMs the command doesnt seem to go anywhere. once it reaches the flat vmdk file it looks like its taking its time to copy, so i let it do its thing. check back a couple hours later and its at the same spot but the disks are now spun down. nothing i can see in syslog.

any pointers where to go from here? Cant be the file size as other VMs were same size or bigger.

JorgeB · March 20, 2017

Try using midnight commander (mc), if there's an error there should be info to help find the reason.

VMs need to be shutdown.

kingy444 · March 21, 2017

Couldnt work midnight commander out. Managed to use WinSCP to download a copy to my PC

Now i have reformatted to XFS are there any additional checks i should do to ensure the integrity of the disk?

JorgeB · March 21, 2017

Newly formatted disk should always be good.

Unmountable Disk and stalling emhttp

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation