A drive failed, says emulated but content not in user share(s)


fysmd

Recommended Posts

Unraid Pro v6.6.3 - diag attached.

 

I noticed a drive was emulated earlier today (only after the Mrs said that Plex was misbehaving! - must check my notification setting!!)

The GUI says that content is emulated but actually, it's not there.  The user share is available but the files which were on the faulty drive are not there.

 

At first, navigating to the parent directory in a shell errored:

an@Server:/mnt/user/TV$ cd The\ Deuce/
ian@Server:/mnt/user/TV/The Deuce$ ls -la
/bin/ls: reading directory '.': Input/output error
total 0
ian@Server:/mnt/user/TV/The Deuce$

So I stopped the array and restarted, now I can navigate to the directory, but not the sub directory which was mounted on the failed drive (split season directories)

 

The faulty drive (disk17 appears in unassigned drives section now (ST3000DM001-1CH166_W1F47CAF)

 

How should I proceed?

 

 

server-diagnostics-20181201-1739.zip

Link to comment
Phase 1 - find and verify superblock...
bad primary superblock - bad CRC in superblock !!!

attempting to find secondary superblock...
.found candidate secondary superblock...
verified secondary superblock...
would write modified primary superblock
Primary superblock would have been modified.
Cannot proceed further in no_modify mode.
Exiting now.

 

 

Link to comment

Thank you so much for helping so quickly!

Phase 1 - find and verify superblock...
bad primary superblock - bad CRC in superblock !!!

attempting to find secondary superblock...
.found candidate secondary superblock...
verified secondary superblock...
writing modified primary superblock
sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 97
resetting superblock realtime bitmap ino pointer to 97
sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 98
resetting superblock realtime summary ino pointer to 98
Phase 2 - using internal log
        - zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

 

So, do I just mount in unassigned drives, or start the array?

Edited by fysmd
Link to comment
10 minutes ago, Squid said:

If it still comes up as unmountable after starting the array, then run it again with the -L option.  Usually no data loss happens.

 

OK, did not mount when I restarted the array so ran with -L:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Maximum metadata LSN (1:138984) is ahead of log (1:2).
Format log to cycle 4.
done

when I try to start the array again, disk17 says unassigned...

 

I mean with the array stopped, there is no disk assignment in disk17 slot.

The drive does appear in the drop down but I guess if I reassign it here it'll erase / overwrite it?

 

Do I need to force it back into the array or something?

Link to comment

Half way through rebuilding and the contents do seem to be emulated again😄 (I was worried there!).

Thank you SOOOOO much for the help, I think without this assistance I would have removed the drive from the array, rebuilt parity, mouted the drive externally and copied any working content back to the array.

 

I had a power incident at home a couple of weeks ago, while the machine stayed up, one drive stopped working completely and I suspect another had a similar issue to this one (it mounted externally and worked, then passed a preclear without issue!)

Time I think to invest in fresh batteries for the UPS which isn't on at the moment!!

Link to comment

O-K..

I had an almost identical issue again a couple fo days ago and followed the process described above (Maintenance mode, disk checks etc, rebuild array after remounting drive and array returned to health after a parity rebuild.

 

Today, I have the same symptoms again but with a different drive again - diag attached prior to doing anything:.

 

I notice that two drives from my array appear in my unassigned drives section on the main screen.  One of them is the drive reporting failed and the other claims to still be healthy in the array, screenshot below:

1230404774_ServerMain(1).thumb.png.73fef282948edb1cc16a90438fd4503a.png

 

Array still started but not happy at all,  lots of data missing :(

 

Am I doing something really wrong somewhere, been running Unraid for a very long time without issues at all, seems to be all wrong right now!

 

Please help!

 

server-diagnostics-20181205-2219.zip

Link to comment

This sounds odd to me.

I do not have any stand-alone port multipliers, could they be a part of the card I'm using??

Also, I have been running this config for many years now and I have only ever had similar issues when I was mixing REISERFS and XFS, I migrated all data without issue and it's been stable since then (until now!).

 

I have upgraded unraid - is it possible (advisable?) to downgrade back to a version which did not exhibit these errors?

 

I have also taken the plunge and gone for one of these puppies: 

https://www.scan.co.uk/products/24-port-broadcom-sas-9305-24i-host-bus-adaptor-internal-12gb-s-sas-pcie-30

 

it's on the recommended HW list so I ought to be golden with this fella - will I??

 

I have another question regarding rebuilding now.  I have disabled all software which might be trying to write changes to my unraid array but with one drive (allegedly) missing and and another in a very off state, how should I proceed to get back healthy?

 

Obviously if both drive refuse to get recognised i cant rebuild the data from parity :(

 

Link to comment
1 minute ago, fysmd said:

I do not have any stand-alone port multipliers, could they be a part of the card I'm using??

The two Marvell controllers you have (they appear to be 8 ports each) have a builtin port-multiplier, and while one of them seems to be behaving for now the other one filled the log with ATA errors, timeouts, disconnects, reconnects and such.

 

3 minutes ago, fysmd said:

I have upgraded unraid - is it possible (advisable?) to downgrade back to a version which did not exhibit these errors?

If they were working better then it might be a good idea, at least until you replace them.

 

3 minutes ago, fysmd said:

That's a good option.

 

4 minutes ago, fysmd said:

I have another question regarding rebuilding now.  I have disabled all software which might be trying to write changes to my unraid array but with one drive (allegedly) missing and and another in a very off state, how should I proceed to get back healthy?

Only one disk is disable, the others just dropped offline since they are on the same controller, if you reboot they should come back online, though I would wait for the LSI to do the rebuild.

Link to comment

OK, well my new toy will be with me tomorrow so I'll take it from there.

Can I just check the approach when I get the new controller, after reconnecting everything should I expect the one disabled drive to still be disabled? is there a way to force it back into life or is it safer (better) to just all it to allow the array to rebuild on the same disk again?

Link to comment
21 minutes ago, fysmd said:

after reconnecting everything should I expect the one disabled drive to still be disabled?

Yes

 

22 minutes ago, fysmd said:

 is there a way to force it back into life or is it safer (better) to just all it to allow the array to rebuild on the same disk again?

If you're sure no data was written to that disk after it got disabled you can do a new config and resync parity instead.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.