Unmountable Disk and stalling emhttp

March 18, 20179 yr

I seen elsewhere not to reformat - so i have not done that

The disk that has become unmountable has been used an an NFS mount to a vSphere server for the past couple of years without issue. Over the last couple of days I have been experiencing some issues with my UNRAID box - it would lock up, no emhttp, no entries in the syslog and the box becomes unresponsive to even a poweroff command meaning a hard reset is the only thing to do.

After some trouleshooting i pinpointed that i think the vSphere connection might be the cause of the issues (though nothing has changed for months in that space)

I disabled auto boot and fired them up one at a time, everything seemed fine for a while, so i decided to be safe and take some snapshots of the boxes. Thats where things went bad.

The box begun to show a bunch of

Mar 18 20:24:57 unRAID kernel: REISERFS error (device md1): vs-4080 _reiserfs_free_block: block 81155249: bit already cleared

and

Mar 18 20:36:36 unRAID shfs/user: err: shfs_open: open: /mnt/disk1/vSphere/KINGPC01/.lck-45ff0b0000000000 (30) Read-only file system

I decided to stop the array and try to restart it to remount as read/write. Now the drive is unmountable.

Could something be wrong with the drive that caused the issues described above (SMART Seems OK) and should i just replace it?

What steps do i take from here to restore the server to its previous glory?

Please provide as much detail as possible, I work in IT and I'm a Gun at windows admin. Linux is a whole different kettle of fish. I can understand if pointed in the right direction, but please dont assume i know what im doing by naming a command to run. (Kind of petrified of destroying my NAS )

unraid-diagnostics-20170318-2049.zip

unraid-syslog-20170318-2049.zip

Edited July 3, 20224 yr by kingy444

Quote

March 18, 20179 yr

Community Expert

You need to check file system on disk1 (md1):

https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Drives_formatted_with_ReiserFS_using_unRAID_v5_or_later

Quote

March 18, 20179 yr

Author

Thanks for the quick response. Ive started a reiserfsck --check /dev/md1

Reading the link you provided, if i get anything other that --fix-fixable as a return, what action should i take granted that this box hosts my VM's. Basically any file on that disk is important to the core system running correctly. is --fix-fixable even appropriate for a disk of this type?

Quote

March 18, 20179 yr

Community Expert

Basically you'll need to do what reiserfsck tells you, --fix-fixable is the best result since it means little corruption.

Quote

March 18, 20179 yr

Author

From what i was reading on other posts, thats likely the best outcome as even just replacing the disk has a risk of replacing the corruption?

Unfortunately it just returned the below

2 found corruptions can be fixed only when running with --rebuild-tree

Do you advise to just run this and hope for the best? What steps from there to identify loss?

Quote

March 18, 20179 yr

Community Expert

Rebuilding a disk won't fix file system corruption.

You need to use --rebuild-tree, usually reiserfsck is very good at repairing the filesystem without issues, in the end look for a lost+found folder, if there are files there they they may be corrupt.

Edited March 18, 20179 yr by johnnie.black

Quote

March 18, 20179 yr

Author

Seriously, Thanks again for the help. I feel the return from that was quite bad. Hopefully you can shed some expert advice on it

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
Replaying journal: Done.
Reiserfs journal '/dev/md1' in blocks [18..8211]: 0 transactions replayed
###########
reiserfsck --rebuild-tree started at Sat Mar 18 21:28:40 2017
###########

Pass 0:
####### Pass 0 #######
Loading on-disk bitmap .. ok, 226344267 blocks marked used
Skipping 30567 blocks (super block, journal, bitmaps) 226313700 blocks will be read
0%block 2523084: The number of items (1531) is incorrect, should be (1) - corrected
block 2523084: The free space (65269) is incorrect, should be (4045) - corrected
pass0: vpf-10110: block 2523084, item (0): Unknown item type found [4094427913 185720068 0xfdf30df6 ??? (15)] - deleted
Segmentation fault

Quote

March 18, 20179 yr

Community Expert

There's apparently a problem with latest reiserfsprogs that some times can't complete a --rebuild-tree, I've seen it happen 2 or 3 times before, it was solved by downgrading to unRAID v6.2.4, it includes an earlier version, and then running --rebuild-tree again.

Quote

March 18, 20179 yr

Community Expert

4 minutes ago, kingy444 said:

Seriously, Thanks again for the help. I feel the return from that was quite bad. Hopefully you can shed some expert advice on it


Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
Replaying journal: Done.
Reiserfs journal '/dev/md1' in blocks [18..8211]: 0 transactions replayed
###########
reiserfsck --rebuild-tree started at Sat Mar 18 21:28:40 2017
###########

Pass 0:
####### Pass 0 #######
Loading on-disk bitmap .. ok, 226344267 blocks marked used
Skipping 30567 blocks (super block, journal, bitmaps) 226313700 blocks will be read
0%block 2523084: The number of items (1531) is incorrect, should be (1) - corrected
block 2523084: The free space (65269) is incorrect, should be (4045) - corrected
pass0: vpf-10110: block 2523084, item (0): Unknown item type found [4094427913 185720068 0xfdf30df6 ??? (15)] - deleted
Segmentation fault

Important, before rebooting grab and post new diagnostics, it may help LT identify the problem.

Edited March 18, 20179 yr by johnnie.black

Quote

March 18, 20179 yr

Author

Diagnostics attached.

IS there a quick way to downgrade, or do i need to download the version and overwrite files manually? I can only see manual listed through Google (this update through the interface really made everything too easy and i haven't unplugged my usb for ages.)

unraid-diagnostics-20170318-2142.zip

Edited July 3, 20224 yr by kingy444

Quote

March 18, 20179 yr

Community Expert

Download v6.2.4, overwrite the 3 bz* files using the flash share and reboot.

Edited March 18, 20179 yr by johnnie.black

Quote

March 18, 20179 yr

Author

So after that ran all night it only put 3 files in lost+found. Pretty happy with that (Ignore perms below, this is a windows dump). My questions is, given the last write time I assume these to be from an old VM I haven't used for a while (and can safely delete them).

But if i did want to try to restore them to their location are there and 'tricks of the trade' to identify where they used to live?

    Directory: \\10.0.0.21\disk1\lost+found


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
------       10/12/2014   3:43 PM    66836967424 3778_4061
------        1/08/2015  10:09 PM             84 10256_31
------       26/08/2015   5:25 AM     4168192000 26_14345

Couple more questions:

Should i take any more steps from here to complete the process? ie, format the drive or replace it and let parity do its job or should it be ok?
Should i update to 6.3.2 again ? Assuming there is no reason not to.
Do my issues with emhttp sound related to a corrupt drive? I can't see any out of memory errors to crash it

Quote

March 18, 20179 yr

Community Expert

Disk should be OK now but you can run reiserfsck --check to confirm, if all is well upgrade back to v6.3.2 and see if emhttp issues are gone now.

Quote

March 19, 20179 yr

Author

ok. so that seems to be working ok. Ran parity for the sake of it too.

I started my first VM after that, then after about 5 minutes started getting the below and the VM locked up. Could you advise what could have changed to start unraid treating this as a read only share? As i said, havent changed anything in this space for ages. SysLog/Diagnostics attached

Mar 19 20:26:43 unRAID shfs/user: err: shfs_truncate: truncate: /mnt/disk1/vSphere/KINGPC01/KINGPC01-000001-delta.vmdk (5) Input/output error
Mar 19 20:26:43 unRAID kernel: REISERFS error (device md1): vs-4080 _reiserfs_free_block: block 391156233: bit already cleared
Mar 19 20:26:43 unRAID kernel: REISERFS (device md1): Remounting filesystem read-only
Mar 19 20:26:44 unRAID shfs/user: err: shfs_truncate: truncate: /mnt/disk1/vSphere/KINGPC01/KINGPC01-000001-delta.vmdk (30) Read-only file system
Mar 19 20:26:44 unRAID shfs/user: err: shfs_open: open: /mnt/disk1/vSphere/KINGPC01/KINGPC01-000001-delta.vmdk (30) Read-only file system
Mar 19 20:26:44 unRAID shfs/user: err: shfs_truncate: truncate: /mnt/disk1/vSphere/KINGPC01/KINGPC01-000001-delta.vmdk (30) Read-only file system
Mar 19 20:26:44 unRAID shfs/user: err: shfs_truncate: truncate: /mnt/disk1/vSphere/KINGPC01/KINGPC01-000001-delta.vmdk (30) Read-only file system

unraid-syslog-20170319-2037.zip

unraid-diagnostics-20170319-2039.zip

Edited July 3, 20224 yr by kingy444

Quote

March 19, 20179 yr

Community Expert

There's file system problems again on disk1, you need to run reiserfsck one more time.

Since they are re-ocurring, once it's fixed again maybe best to move all data from that disk and format it, you can use XFS, it's the current preferred file system for data disks.

PS: VM would perform much better on your cache disk.

PPS: No need to post the syslog, it's included with the diagnostics.

Quote

March 19, 20179 yr

Author

no worries. Well ill kick that process off shortly then.

To be clear on my setup (re vms running better on cache). I currently run vSphere connected via NFS. I use disk1 for the purpose of the Parity it provides. What would you suggest if i were to move them to cache in order to keep them backed up?

Quote

March 19, 20179 yr

Community Expert

You'd need to create a cache pool by changing current cache file system to btrfs and adding another disk, it would then be protected in RAID1.

Quote

March 19, 20179 yr

Author

just to clarify, that would mean that by having multiple cache drives, the vm's would be stored on both cache drives (RAID1) and if i lost the cache drive i would have them on the second disk as a backup?

EDIT: --check just finished. No errors returned this time

Edited March 19, 20179 yr by kingy444
update

Quote

March 19, 20179 yr

Community Expert

2 minutes ago, kingy444 said:

just to clarify, that would mean that by having multiple cache drives, the vm's would be stored on both cache drives (RAID1) and if i lost the cache drive i would have them on the second disk as a backup?

Yes

3 minutes ago, kingy444 said:

EDIT: --check just finished. No errors returned this time

There are clearly still issues, I would proceed by moving all data to other disk(s) and format that one with XFS.

Quote

March 19, 20179 yr

Author

Started that, likely take me a couple of days unfortunately.

Quick question. Obviously SSD is going to be the quickest option but that will set me back some $$$.

Currently i have a WD Black 7200RPM drive as a cache, duplicating that is much cheaper and i haven't had too much of an issue thus far with write speeds. What are your thoughts?

Quote

March 19, 20179 yr

Community Expert

If you are happy with current cache speed I'd say use another Black, raid1 cache pool will still have faster writes than the array, so the VM will also have better writing speed, reads will be similar to what they were.

Quote

March 20, 20179 yr

Author

So the move of the VMs has gone much quicker than planned, with one exception

i have hit a snag moving two of them. Basically using mv -v src dest and thats been fine, but when moving two particular VMs the command doesnt seem to go anywhere. once it reaches the flat vmdk file it looks like its taking its time to copy, so i let it do its thing. check back a couple hours later and its at the same spot but the disks are now spun down. nothing i can see in syslog.

any pointers where to go from here? Cant be the file size as other VMs were same size or bigger.

Quote

March 20, 20179 yr

Community Expert

Try using midnight commander (mc), if there's an error there should be info to help find the reason.

VMs need to be shutdown.

Quote

March 21, 20179 yr

Author

Couldnt work midnight commander out. Managed to use WinSCP to download a copy to my PC

Now i have reformatted to XFS are there any additional checks i should do to ensure the integrity of the disk?

Quote

March 21, 20179 yr

Community Expert

Newly formatted disk should always be good.

Quote

Unmountable Disk and stalling emhttp

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)