(SOLVED) Failed disk followed by I/O errors


Recommended Posts

    Last week I was sick and didn't bother even looking at my server for a few days.  When I finally got back to it I noticed that disk 4 was disabled with it's contents emulated.  It's my oldest disk that I got already heavily used so I didn't think much of it and ordered a new drive for replacement.  This morning I threw the drive into my second server to run pre-clear and figured in the mean time I might move around some data.  I installed a new disk 3 weeks ago and it was mostly empty so I downloaded unbalance and remembered the last time I worked with it I needed to run the Docker Safe New Perms script, so I did again here.

 

    I shut down my Docker containers but then decided against the whole thing and uninstalled unbalance and tried to restart my containers, but a lot of them were shutting back down after trying to start up, and the ones that did start weren't working.  I tailed my syslog and got a loop of i/o errors.

Dec  8 18:24:47 Yuki emhttpd: error: get_fs_sizes, 6306: Input/output error (5): statfs: /mnt/user/Clay
Dec  8 18:24:47 Yuki emhttpd: error: get_fs_sizes, 6306: Input/output error (5): statfs: /mnt/user/CommunityApplicationsAppdataBackup
Dec  8 18:24:47 Yuki emhttpd: error: get_fs_sizes, 6306: Input/output error (5): statfs: /mnt/user/Handbrake
Dec  8 18:24:47 Yuki emhttpd: error: get_fs_sizes, 6306: Input/output error (5): statfs: /mnt/user/Literature
Dec  8 18:24:47 Yuki emhttpd: error: get_fs_sizes, 6306: Input/output error (5): statfs: /mnt/user/PlexTranscode
Dec  8 18:24:47 Yuki emhttpd: error: get_fs_sizes, 6306: Input/output error (5): statfs: /mnt/user/Temp
Dec  8 18:24:47 Yuki emhttpd: error: get_fs_sizes, 6306: Input/output error (5): statfs: /mnt/user/Terraria
Dec  8 18:24:47 Yuki emhttpd: error: get_fs_sizes, 6306: Input/output error (5): statfs: /mnt/user/VMC
Dec  8 18:24:47 Yuki emhttpd: error: get_fs_sizes, 6306: Input/output error (5): statfs: /mnt/user/appdata
Dec  8 18:24:47 Yuki emhttpd: error: get_fs_sizes, 6306: Input/output error (5): statfs: /mnt/user/domains
Dec  8 18:24:47 Yuki emhttpd: error: get_fs_sizes, 6306: Input/output error (5): statfs: /mnt/user/isos
Dec  8 18:24:47 Yuki emhttpd: error: get_fs_sizes, 6306: Input/output error (5): statfs: /mnt/user/system

    This exact list of errors repeats every second in my syslog.  I also tried installing unbalance a couple times just because I was confused as to what was happening but could never get it to actually run properly.  I had to leave for quite some time but I got back and started looking into it again and found this line in my syslog 

Dec  8 08:42:04 Yuki kernel: XFS (md4): Corruption of in-memory data detected.  Shutting down filesystem

    At one point I was also wondering if the new permissions script messed something up as my containers where giving errors about accessing their config files and looked into restoring the appdata folder from a backup, but the restore appdata tab is only showing a backup from 2019-11-18 even though it was set to run every week.  When browsing to the location where the backups are kept from a windows machine, this is the only backup shown.  Looking directly at disk 5 in unraid shows I should have backups from 2019-11-25 and 2019-12-02 but the backup that does show up from 11-18 is stored in Disk 1.  Digging around, other files that the Unraid gui shows should be on disk 5 don't appear when browsing shares from windows, including files I have in the past few weeks copied from these shares to my windows machine.

 

    Attempting to write a folder to any share from windows results in this error code.

An unexpected error is keeping you from creating the folder.  If you continue to receive this error, you can use the error code to search for help with with problem.

Error 0x8007045D: The request could not be performed because of an I/O device error.

    I've read enough horror stories about people taking steps to fix problems and making it worse so I hope I haven't already gone too far and was wondering what I should do from here?  The replacement drive is 15% through step 2 of 5 of the pre-clear.  Did the system switch to read-only to protect itself and will it fix itself once I shutdown and replace the bad disk?  Is there more I need to do in the meantime to recover?  Is it even recoverable?  Or do I start trying to copy as much as I can off while the system is still functioning?

yuki-diagnostics-20191209-0032.zip

Link to comment
Quote

 If you were told you have drive errors, you are most likely not in the right place.

Does this not apply to me? Disk 4 is the drive that failed.

 

It also reads further down that for xfs I will have to start the array in Maintenance mode.  I have been hesitant to shutdown or stop the array since disk 5 files were not appearing when browsing shares as I didn't know if something had happened to it as well.  Unraid hasn't told me that anything is wrong with it but I wasn't sure with the missing files and all.

 

I don't mean to doubt you, I'll run the test if you say that's what's best, I just want to make sure I lose as little data as possible.

 

On a side note my VMs are located on an SSD mounted by Unassigned Devices and are still working currently.  Tailing my syslog shows 5 more shares on the repeating error list, and my syslog is 91% full according to the dashboard.

Link to comment

I followed the instructions in your link and put the array in maintenance mode to run the test but it's just been sitting on the same step for a few hours.

Phase 1 - find and verify superblock...
couldn't verify primary superblock - not enough secondary superblocks with matching geometry !!!

attempting to find secondary superblock...

This is followed by a line with 3,708,871 periods.

Edited by Clay Smith
Link to comment
29 minutes ago, Clay Smith said:

I followed the instructions in your link and put the array in maintenance mode to run the test but it's just been sitting on the same step for a few hours.


Phase 1 - find and verify superblock...
couldn't verify primary superblock - not enough secondary superblocks with matching geometry !!!

attempting to find secondary superblock...

This is followed by a line with 3,708,871 periods.

Which  command hook did you use.

xfs_repair -L  ???

Link to comment
15 minutes ago, Harro said:

Which  command hook did you use.

xfs_repair -L  ???

-n will just show a report  

-L  "as per the wiki" Force Log Zeroing. Forces xfs_repair to zero the log even if it is dirty (contains metadata changes). When using this option the filesystem will likely appear to be corrupt, and can cause the loss of user files and/or data.

Link to comment
26 minutes ago, Harro said:

-n will just show a report  

-L  "as per the wiki" Force Log Zeroing. Forces xfs_repair to zero the log even if it is dirty (contains metadata changes). When using this option the filesystem will likely appear to be corrupt, and can cause the loss of user files and/or data.

 

Did I misunderstand the page?  Should I have run it with -n and -L?  Would it be right to cancel it?  On the settings page for the disk it still says it's running.

Link to comment
4 hours ago, johnnie.black said:

Post the complete command used.

I used the webgui so I didn't type in a complete command.  I clicked 'Disk 4' on the main page to get to the disk settings and then clicked the 'Check' button under the 'Check Filesystem Status' section.  In the options box I had left just the -n that was there by default.  At the time the syslog was full so I can't grab what it said then but I have since truncated the syslog.  When I ran it when I got home last night with the -L this was the line in the syslog

Dec  9 20:28:12 Yuki ool www[13176]: /usr/local/emhttp/plugins/dynamix/scripts/xfs_check 'start' '/dev/md4' 'WDC_WD60EZRX-00MVLB1_WD-WXL1H642CJCJ' '-L'

 

Edited by Clay Smith
Link to comment

Just completed with from when I ran it last night with -L

Phase 1 - find and verify superblock...
couldn't verify primary superblock - not enough secondary superblocks with matching geometry !!!

attempting to find secondary superblock...
...............(5.7 million dots)...........Sorry, could not find valid secondary superblock
Exiting now.

That's all it gives.  I'd assume this means that it's not fixed.  I'm not really sure where to go from here.  Based on syslog line from my last post, did it run properly? Or do I need to use the terminal to run a better command?

Link to comment

My system has a weird quirk where it won't boot if it's not hooked to a monitor and I'm not home to plug one in right now.  When I start the array, should I start it normally or in maintenance mode?  Is there anything I can do in the mean time before I reboot tonight or should I just hold off and report back?

Link to comment

I've started it and it's doing the same thing so far

Phase 1 - find and verify superblock...
couldn't verify primary superblock - not enough secondary superblocks with matching geometry !!!

attempting to find secondary superblock...

followed by it generating a bunch of periods.

 

While it was started in normal mode I was able to browse the shares in windows and noticed that the appdata backups that weren't showing before were now visible and a video file that IIRC was located on Disk 4 was also visible.

Edited by Clay Smith
Link to comment

In normal mode there's a mount attempt and it warns of xfs corruption, but then xfs_repair can't find the superblock, very weird, it would suggest there's a serious problem with the file system on the emulated disk, possibly parity wasn't 100% in sync.

 

Try to mount the old disk with UD, if it mounts correctly and since the disk looks healthy it's best to do a new config and re-sync parity, note that if there's any data written to that disk after it got disable will be lost.

Link to comment
1 hour ago, johnnie.black said:

Try to mount the old disk with UD, if it mounts correctly and since the disk looks healthy it's best to do a new config and re-sync parity, note that if there's any data written to that disk after it got disable will be lost.

Should I cancel the current xfs_repair operation to do this or wait until it completes?  If starting the array in normal mode gives me access to the (emulated?) disk, would it be worth trying to copy any of my recent writes to another drive before attempting to mount with UD?

 

Link to comment
  • Clay Smith changed the title to (SOLVED) Failed disk followed by I/O errors

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.