Server died - no web gui, lost access to a disk


Nem

Recommended Posts

I upgraded to the latest version of unraid the other day and today I've been going through the process of upgrading docker containers. Some strange things started happening, but unfortunately I cant go into specifics because I wasnt expecting anything to go wrong so didn't go looking for warnings/error messages

 

The server started its monthly parity check and at about 70% I noticed that the web ui was no longer responsive and I couldn't access my shares. I thought maybe the server had crashed and restarted the server physically. Web ui started working again and a notification said the parity check was OK (which it clearly wasn't because I don't believe it even finished)

 

I upgraded my sonarr container, after which when I try to access a mapped share from my Windows machine all subdirectories appeared empty. No amount of server/client restarts made the directories visible. They could only been seen when I navigate directly to the ip (as opposed to through the mapped share)

 

I then upgraded my nginx-letsencrypt container, and after downloading images an error showed up saying something about layers not being found. I didn't have time to write this message down because I then lost connection to the web ui

 

Restarted server and clients and now the unraid machine is completely dead. I cant access shares from clients, the web ui doesn't work either.

 

I plugged a monitor and keyboard into the server and normally it would boot and stay on the user login screen. I noticed that some more text appears after it asks for a username:

 

https://dl.dropboxusercontent.com/u/33102508/2016-10-01%2010.03.38.jpg

 

I tried to log in as root despite the extra text, which worked, and took a look in /mnt/cache/ and one of my hard drives cant be found.

 

As of right now the server is unusable as I cant access either the web ui or any shares. I'm not sure what the physical hard drive would have to do with the drive that went missing as it was a data drive and all of my docker containers are stored on a cache SSD. Maybe it has something to do with crashing on the parity check?

 

How should I diagnose and fix this problem? At the very least I want to backup/recover the data on the server in case I end up losing anything...

Link to comment

so I logged into root by plugging a keyboard into the server, changed the line in disk.cfg so start array is set to no

rebooted

logged back in as root and tried to run xfs_repair -v /dev/md2

 

It gives me an error:

 

/dev/md2/: No such file or directory

Could not intialize XFS library

 

I guess thats happening because its not part of the drive array anymore because I stopped it so I should use /dev/sdXX/ instead, but how do I tell which one is disk2? I have sda, sda1, ..., sde, sde1

Link to comment

I'm getting an error when I try to run this from the command line:

 

Phase 1 - find and verify superblock...

        - block cache size set to 709592 entries

Phase 2 - using internal log

        - zero log...

zero_log: head block 2424704 tail block 2423719

ERROR: The filesystem has valuable metadata changes in a log which needs to

be replayed.  Mount the filesystem to replay the log, and unmount it before

re-running xfs_repair.  If you are unable to mount the filesystem, then use

the -L option to destroy the log and attempt a repair.

Note that destroying the log may cause corruption -- please attempt a mount

of the filesystem before doing this.

 

However, I'm now able to get into the webui, so if I click on drive 2 and go into the Check Filesystem Status section and run the repair tool with the -n flag, I get this output:

 

Phase 1 - find and verify superblock...

Phase 2 - using internal log

        - zero log...

        - scan filesystem freespace and inode maps...

Metadata corruption detected at xfs_agf block 0x575428d9/0x200

flfirst 118 in agf 1 too large (max = 118)

agf 118 freelist blocks bad, skipping freelist scan

agi unlinked bucket 37 is 193213157 in ag 1 (inode=2340696805)

sb_ifree 210, counted 205

sb_fdblocks 495495645, counted 495211384

        - found root inode chunk

Phase 3 - for each AG...

        - scan (but don't clear) agi unlinked lists...

        - process known inodes and perform inode discovery...

        - agno = 0

        - agno = 1

        - agno = 2

        - agno = 3

        - process newly discovered inodes...

Phase 4 - check for duplicate blocks...

        - setting up duplicate extent list...

        - check for inodes claiming duplicate blocks...

        - agno = 0

        - agno = 1

        - agno = 2

        - agno = 3

No modify flag set, skipping phase 5

Phase 6 - check inode connectivity...

        - traversing filesystem ...

        - traversal finished ...

        - moving disconnected inodes to lost+found ...

disconnected inode 2340696805, would move to lost+found

Phase 7 - verify link counts...

would have reset inode 2340696805 nlinks from 0 to 1

No modify flag set, skipping filesystem flush and exiting.

 

Is it adviseable at this point to use the webgui and run it without the -n flag? Or should I be concerned about the log error?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.