kingy444 Posted March 18, 2017 Share Posted March 18, 2017 (edited) I seen elsewhere not to reformat - so i have not done that The disk that has become unmountable has been used an an NFS mount to a vSphere server for the past couple of years without issue. Over the last couple of days I have been experiencing some issues with my UNRAID box - it would lock up, no emhttp, no entries in the syslog and the box becomes unresponsive to even a poweroff command meaning a hard reset is the only thing to do. After some trouleshooting i pinpointed that i think the vSphere connection might be the cause of the issues (though nothing has changed for months in that space) I disabled auto boot and fired them up one at a time, everything seemed fine for a while, so i decided to be safe and take some snapshots of the boxes. Thats where things went bad. The box begun to show a bunch of Mar 18 20:24:57 unRAID kernel: REISERFS error (device md1): vs-4080 _reiserfs_free_block: block 81155249: bit already cleared and Mar 18 20:36:36 unRAID shfs/user: err: shfs_open: open: /mnt/disk1/vSphere/KINGPC01/.lck-45ff0b0000000000 (30) Read-only file system I decided to stop the array and try to restart it to remount as read/write. Now the drive is unmountable. Could something be wrong with the drive that caused the issues described above (SMART Seems OK) and should i just replace it? What steps do i take from here to restore the server to its previous glory? Please provide as much detail as possible, I work in IT and I'm a Gun at windows admin. Linux is a whole different kettle of fish. I can understand if pointed in the right direction, but please dont assume i know what im doing by naming a command to run. (Kind of petrified of destroying my NAS ) unraid-diagnostics-20170318-2049.zip unraid-syslog-20170318-2049.zip Edited July 3, 2022 by kingy444 Quote Link to comment
JorgeB Posted March 18, 2017 Share Posted March 18, 2017 You need to check file system on disk1 (md1): https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Drives_formatted_with_ReiserFS_using_unRAID_v5_or_later Quote Link to comment
kingy444 Posted March 18, 2017 Author Share Posted March 18, 2017 Thanks for the quick response. Ive started a reiserfsck --check /dev/md1 Reading the link you provided, if i get anything other that --fix-fixable as a return, what action should i take granted that this box hosts my VM's. Basically any file on that disk is important to the core system running correctly. is --fix-fixable even appropriate for a disk of this type? Quote Link to comment
JorgeB Posted March 18, 2017 Share Posted March 18, 2017 Basically you'll need to do what reiserfsck tells you, --fix-fixable is the best result since it means little corruption. Quote Link to comment
kingy444 Posted March 18, 2017 Author Share Posted March 18, 2017 From what i was reading on other posts, thats likely the best outcome as even just replacing the disk has a risk of replacing the corruption? Unfortunately it just returned the below 2 found corruptions can be fixed only when running with --rebuild-tree Do you advise to just run this and hope for the best? What steps from there to identify loss? Quote Link to comment
JorgeB Posted March 18, 2017 Share Posted March 18, 2017 (edited) Rebuilding a disk won't fix file system corruption. You need to use --rebuild-tree, usually reiserfsck is very good at repairing the filesystem without issues, in the end look for a lost+found folder, if there are files there they they may be corrupt. Edited March 18, 2017 by johnnie.black Quote Link to comment
kingy444 Posted March 18, 2017 Author Share Posted March 18, 2017 Seriously, Thanks again for the help. I feel the return from that was quite bad. Hopefully you can shed some expert advice on it Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes Replaying journal: Done. Reiserfs journal '/dev/md1' in blocks [18..8211]: 0 transactions replayed ########### reiserfsck --rebuild-tree started at Sat Mar 18 21:28:40 2017 ########### Pass 0: ####### Pass 0 ####### Loading on-disk bitmap .. ok, 226344267 blocks marked used Skipping 30567 blocks (super block, journal, bitmaps) 226313700 blocks will be read 0%block 2523084: The number of items (1531) is incorrect, should be (1) - corrected block 2523084: The free space (65269) is incorrect, should be (4045) - corrected pass0: vpf-10110: block 2523084, item (0): Unknown item type found [4094427913 185720068 0xfdf30df6 ??? (15)] - deleted Segmentation fault Quote Link to comment
JorgeB Posted March 18, 2017 Share Posted March 18, 2017 There's apparently a problem with latest reiserfsprogs that some times can't complete a --rebuild-tree, I've seen it happen 2 or 3 times before, it was solved by downgrading to unRAID v6.2.4, it includes an earlier version, and then running --rebuild-tree again. Quote Link to comment
JorgeB Posted March 18, 2017 Share Posted March 18, 2017 (edited) 4 minutes ago, kingy444 said: Seriously, Thanks again for the help. I feel the return from that was quite bad. Hopefully you can shed some expert advice on it Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes Replaying journal: Done. Reiserfs journal '/dev/md1' in blocks [18..8211]: 0 transactions replayed ########### reiserfsck --rebuild-tree started at Sat Mar 18 21:28:40 2017 ########### Pass 0: ####### Pass 0 ####### Loading on-disk bitmap .. ok, 226344267 blocks marked used Skipping 30567 blocks (super block, journal, bitmaps) 226313700 blocks will be read 0%block 2523084: The number of items (1531) is incorrect, should be (1) - corrected block 2523084: The free space (65269) is incorrect, should be (4045) - corrected pass0: vpf-10110: block 2523084, item (0): Unknown item type found [4094427913 185720068 0xfdf30df6 ??? (15)] - deleted Segmentation fault Important, before rebooting grab and post new diagnostics, it may help LT identify the problem. Edited March 18, 2017 by johnnie.black Quote Link to comment
kingy444 Posted March 18, 2017 Author Share Posted March 18, 2017 (edited) Diagnostics attached. IS there a quick way to downgrade, or do i need to download the version and overwrite files manually? I can only see manual listed through Google (this update through the interface really made everything too easy and i haven't unplugged my usb for ages.) unraid-diagnostics-20170318-2142.zip Edited July 3, 2022 by kingy444 Quote Link to comment
JorgeB Posted March 18, 2017 Share Posted March 18, 2017 (edited) Download v6.2.4, overwrite the 3 bz* files using the flash share and reboot. Edited March 18, 2017 by johnnie.black Quote Link to comment
kingy444 Posted March 18, 2017 Author Share Posted March 18, 2017 So after that ran all night it only put 3 files in lost+found. Pretty happy with that (Ignore perms below, this is a windows dump). My questions is, given the last write time I assume these to be from an old VM I haven't used for a while (and can safely delete them). But if i did want to try to restore them to their location are there and 'tricks of the trade' to identify where they used to live? Directory: \\10.0.0.21\disk1\lost+found Mode LastWriteTime Length Name ---- ------------- ------ ---- ------ 10/12/2014 3:43 PM 66836967424 3778_4061 ------ 1/08/2015 10:09 PM 84 10256_31 ------ 26/08/2015 5:25 AM 4168192000 26_14345 Couple more questions: Should i take any more steps from here to complete the process? ie, format the drive or replace it and let parity do its job or should it be ok? Should i update to 6.3.2 again ? Assuming there is no reason not to. Do my issues with emhttp sound related to a corrupt drive? I can't see any out of memory errors to crash it Quote Link to comment
JorgeB Posted March 18, 2017 Share Posted March 18, 2017 Disk should be OK now but you can run reiserfsck --check to confirm, if all is well upgrade back to v6.3.2 and see if emhttp issues are gone now. Quote Link to comment
kingy444 Posted March 19, 2017 Author Share Posted March 19, 2017 (edited) ok. so that seems to be working ok. Ran parity for the sake of it too. I started my first VM after that, then after about 5 minutes started getting the below and the VM locked up. Could you advise what could have changed to start unraid treating this as a read only share? As i said, havent changed anything in this space for ages. SysLog/Diagnostics attached Mar 19 20:26:43 unRAID shfs/user: err: shfs_truncate: truncate: /mnt/disk1/vSphere/KINGPC01/KINGPC01-000001-delta.vmdk (5) Input/output error Mar 19 20:26:43 unRAID kernel: REISERFS error (device md1): vs-4080 _reiserfs_free_block: block 391156233: bit already cleared Mar 19 20:26:43 unRAID kernel: REISERFS (device md1): Remounting filesystem read-only Mar 19 20:26:44 unRAID shfs/user: err: shfs_truncate: truncate: /mnt/disk1/vSphere/KINGPC01/KINGPC01-000001-delta.vmdk (30) Read-only file system Mar 19 20:26:44 unRAID shfs/user: err: shfs_open: open: /mnt/disk1/vSphere/KINGPC01/KINGPC01-000001-delta.vmdk (30) Read-only file system Mar 19 20:26:44 unRAID shfs/user: err: shfs_truncate: truncate: /mnt/disk1/vSphere/KINGPC01/KINGPC01-000001-delta.vmdk (30) Read-only file system Mar 19 20:26:44 unRAID shfs/user: err: shfs_truncate: truncate: /mnt/disk1/vSphere/KINGPC01/KINGPC01-000001-delta.vmdk (30) Read-only file system unraid-syslog-20170319-2037.zip unraid-diagnostics-20170319-2039.zip Edited July 3, 2022 by kingy444 Quote Link to comment
JorgeB Posted March 19, 2017 Share Posted March 19, 2017 There's file system problems again on disk1, you need to run reiserfsck one more time. Since they are re-ocurring, once it's fixed again maybe best to move all data from that disk and format it, you can use XFS, it's the current preferred file system for data disks. PS: VM would perform much better on your cache disk. PPS: No need to post the syslog, it's included with the diagnostics. Quote Link to comment
kingy444 Posted March 19, 2017 Author Share Posted March 19, 2017 no worries. Well ill kick that process off shortly then. To be clear on my setup (re vms running better on cache). I currently run vSphere connected via NFS. I use disk1 for the purpose of the Parity it provides. What would you suggest if i were to move them to cache in order to keep them backed up? Quote Link to comment
JorgeB Posted March 19, 2017 Share Posted March 19, 2017 You'd need to create a cache pool by changing current cache file system to btrfs and adding another disk, it would then be protected in RAID1. Quote Link to comment
kingy444 Posted March 19, 2017 Author Share Posted March 19, 2017 (edited) just to clarify, that would mean that by having multiple cache drives, the vm's would be stored on both cache drives (RAID1) and if i lost the cache drive i would have them on the second disk as a backup? EDIT: --check just finished. No errors returned this time Edited March 19, 2017 by kingy444 update Quote Link to comment
JorgeB Posted March 19, 2017 Share Posted March 19, 2017 2 minutes ago, kingy444 said: just to clarify, that would mean that by having multiple cache drives, the vm's would be stored on both cache drives (RAID1) and if i lost the cache drive i would have them on the second disk as a backup? Yes 3 minutes ago, kingy444 said: EDIT: --check just finished. No errors returned this time There are clearly still issues, I would proceed by moving all data to other disk(s) and format that one with XFS. Quote Link to comment
kingy444 Posted March 19, 2017 Author Share Posted March 19, 2017 Started that, likely take me a couple of days unfortunately. Quick question. Obviously SSD is going to be the quickest option but that will set me back some $$$. Currently i have a WD Black 7200RPM drive as a cache, duplicating that is much cheaper and i haven't had too much of an issue thus far with write speeds. What are your thoughts? Quote Link to comment
JorgeB Posted March 19, 2017 Share Posted March 19, 2017 If you are happy with current cache speed I'd say use another Black, raid1 cache pool will still have faster writes than the array, so the VM will also have better writing speed, reads will be similar to what they were. Quote Link to comment
kingy444 Posted March 20, 2017 Author Share Posted March 20, 2017 So the move of the VMs has gone much quicker than planned, with one exception i have hit a snag moving two of them. Basically using mv -v src dest and thats been fine, but when moving two particular VMs the command doesnt seem to go anywhere. once it reaches the flat vmdk file it looks like its taking its time to copy, so i let it do its thing. check back a couple hours later and its at the same spot but the disks are now spun down. nothing i can see in syslog. any pointers where to go from here? Cant be the file size as other VMs were same size or bigger. Quote Link to comment
JorgeB Posted March 20, 2017 Share Posted March 20, 2017 Try using midnight commander (mc), if there's an error there should be info to help find the reason. VMs need to be shutdown. Quote Link to comment
kingy444 Posted March 21, 2017 Author Share Posted March 21, 2017 Couldnt work midnight commander out. Managed to use WinSCP to download a copy to my PC Now i have reformatted to XFS are there any additional checks i should do to ensure the integrity of the disk? Quote Link to comment
JorgeB Posted March 21, 2017 Share Posted March 21, 2017 Newly formatted disk should always be good. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.