Jump to content

Data loss after Drive error


Recommended Posts

Odd problem with my Unraid box.  While I was on vacation last week I got the email below.  

--

Event: Unraid array errors
Subject: Warning [MAXIMUS] - array has errors
Description: Array has 1 disk with read errors
Importance: warning

Disk 3 - WDC_WD100EMAZ-00WJTA0_2YJDWGRD (sdb) (errors 1024)

--

When I got home I noticed most of my Plex server said it didn't have access to the media.  I saw the error in the dashboard so I decided to reboot the server.  After the reboot everything checked out as healthy - I even got a second email telling me the drive had returned to normal.  However, the drive while marked healthy now will not mount.  The array still starts but I am missing all the data that was on that drive.  I pulled the drive and I can't get it to mount in Windows, Mac or Linux.  I figured when I got the errors that I would have to replace the disk but it is acting like the data on that drive was removed and not lost due to a drive failure.  Has anyone had a problem like this.  I powered down my Unraid box.  I have spent so much time ripping my DVD and Blu-ray collection into Plex I am going to be supper frustrated if I have to start that all over again.  This is the whole reason I went with redundancy and not just a drive off the shelf.

 

Rick

 

Link to comment

For the time being keep the removed drive somewhere safe. Don't put it back in your server just yet and don't let other operating systems mess with it.

 

It was Disk 3. Since it's no longer present in your server its contents are being emulated. But there's also file system corruption so the emulated disk won't mount. First thing is to try to repair the emulated disk by running a file system check on it. So stop the array and re-start it in Maintenance mode, then click on the text "Disk 3" (on the Main page). Scroll down to Check File System Status and run the check. If you leave the "-n" in the box it will only read from the emulated disk and won't actually make any changes, so delete the "-n". You can replace it with "-v" to make the output more verbose. Wait for it to complete and post a screenshot of the result.

 

Link to comment

Phase 1 - find and verify superblock...

Phase 2 - using internal log

        - zero log...

ALERT: The filesystem has valuable metadata changes in a log which is being

destroyed because the -L option was used.

        - scan filesystem freespace and inode maps...

finobt ir_freecount/free mismatch, inode chunk 8/193538560, freecount 1 nfree 3

sb_fdblocks 76134471, counted 78574806

        - found root inode chunk

Phase 3 - for each AG...

        - scan and clear agi unlinked lists...

        - process known inodes and perform inode discovery...

        - agno = 0

        - agno = 1

        - agno = 2

        - agno = 3

        - agno = 4

        - agno = 5

        - agno = 6

        - agno = 7

        - agno = 8

Metadata corruption detected at 0x45bcd8, xfs_dir3_block block 0x1dd4a3b98/0x1000

corrupt block 0 in directory inode 8783473165

will junk block

no . entry for directory 8783473165

no .. entry for directory 8783473165

problem with directory contents in inode 8783473165

cleared inode 8783473165

data fork in ino 8783473170 claims free block 1097934190

data fork in ino 8783473170 claims free block 1097934191

data fork in ino 8783473191 claims free block 2320348258

correcting nblocks for inode 8783473191, was 876 - counted 941

data fork in ino 8783473192 claims free block 2344985649

        - agno = 9

        - agno = 10

        - agno = 11

        - agno = 12

        - agno = 13

Bad atime nsec 1007960289 on inode 14212352420, resetting to zero

Bad mtime nsec 1003799651 on inode 14212352420, resetting to zero

Bad ctime nsec 1003799651 on inode 14212352420, resetting to zero

Bad crtime nsec 1007960289 on inode 14212352420, resetting to zero

data fork in ino 14212352420 claims free block 1912285981

        - agno = 14

        - agno = 15

        - agno = 16

        - agno = 17

data fork in ino 18341726330 claims free block 2351623635

        - agno = 18

        - agno = 19

        - process newly discovered inodes...

Phase 4 - check for duplicate blocks...

        - setting up duplicate extent list...

        - check for inodes claiming duplicate blocks...

        - agno = 0

        - agno = 2

        - agno = 1

        - agno = 3

        - agno = 4

        - agno = 5

        - agno = 6

        - agno = 7

        - agno = 8

        - agno = 9

        - agno = 10

        - agno = 11

        - agno = 12

entry "resilio-sync" in shortform directory 7121304162 references free inode 8783473165

junking entry "resilio-sync" in directory inode 7121304162

corrected i8 count in directory 7121304162, was 5, now 4

        - agno = 13

        - agno = 14

        - agno = 15

        - agno = 16

        - agno = 17

        - agno = 18

        - agno = 19

Phase 5 - rebuild AG headers and trees...

        - reset superblock...

Phase 6 - check inode connectivity...

        - resetting contents of realtime bitmap and summary inodes

        - traversing filesystem ...

        - traversal finished ...

        - moving disconnected inodes to lost+found ...

disconnected inode 8783473169, moving to lost+found

disconnected inode 8783473170, moving to lost+found

disconnected inode 8783473174, moving to lost+found

disconnected inode 8783473175, moving to lost+found

disconnected inode 8783473176, moving to lost+found

disconnected inode 8783473177, moving to lost+found

disconnected inode 8783473178, moving to lost+found

disconnected inode 8783473179, moving to lost+found

disconnected inode 8783473180, moving to lost+found

disconnected inode 8783473181, moving to lost+found

disconnected inode 8783473182, moving to lost+found

disconnected inode 8783473183, moving to lost+found

disconnected inode 8783473185, moving to lost+found

disconnected inode 8783473186, moving to lost+found

disconnected inode 8783473187, moving to lost+found

disconnected inode 8783473189, moving to lost+found

disconnected inode 8783473190, moving to lost+found

disconnected inode 8783473191, moving to lost+found

disconnected inode 8783473192, moving to lost+found

disconnected inode 8783473193, moving to lost+found

disconnected inode 8783473194, moving to lost+found

disconnected inode 8783473195, moving to lost+found

disconnected inode 8783473196, moving to lost+found

disconnected inode 8783473197, moving to lost+found

disconnected inode 8783473199, moving to lost+found

disconnected inode 8783474264, moving to lost+found

disconnected dir inode 9849547124, moving to lost+found

Phase 7 - verify and correct link counts...

resetting inode 155976774 nlinks from 2 to 3

resetting inode 7121304162 nlinks from 7 to 6

Maximum metadata LSN (5:342500) is ahead of log (1:2).

Format log to cycle 8.

done

Link to comment
8 hours ago, riZnich said:

can I wipe the drive and add it back in and have the array rebuild?  

Questions like this make me worry. There is absolutely no point in wiping the drive since it would be completely overwritten during rebuild.

 

The reason it makes me worry is because many users think they need to format a disk before trying to use it. They have a vague (enough to be wrong) idea of what format does.

 

Format means "write an empty filesystem to this disk". That is what it has always meant in every operating system you have ever used. If you format a disk in the parity array, Unraid treats that write operation exactly as it does any other, by updating parity. So after the format, parity agrees your disk has an empty filesystem, and rebuild can only result in an empty filesystem.

 

As mentioned, it would be useful to see the diagnostics with the disk attached.

 

 

Link to comment

I think you misunderstood my intention.  Firstly, the drive is not in the party array any more - it is in unassigned devices.  I was going to format the drive so that it would mount in unassigned devices because in its current state it won't mount.  Once mountable I could run preclear on the drive which would give a lot of data on the health of the drive.  I have already ordered a new drive to put in the array so I am trying to assess if this drive failed because of a system corruption error or because it has physical problems.  It if is physically sound I would do a drive swap on one of my smaller drives at a later date.  If it isn't sound I will shelf it.  I understand that Unraid prepares / formats a drive when you add it to the array.  I may not have as deep a systems understanding of how to fix the system, but I do understand how the parity portion of Unraid works so I agree - if you have a drive in the array and you format it you are deleting all the data and the party will see this and adjust its parity data accordingly.

Link to comment
12 minutes ago, riZnich said:

Once mountable I could run preclear on the drive which would give a lot of data on the health of the drive.

Preclear doesn't care if a drive is mountable. Mountable means there is a valid filesystem in place, preclear writes zeroes to the entire capacity, which will remove any filesystem anyway.

Link to comment
2 hours ago, riZnich said:

 I was going to format the drive so that it would mount in unassigned devices because in its current state it won't mount.

The usual fix for unmountable filesystem, and what I had in mind for you with the unassigned device, is to try to repair it similar to what you did with the emulated disk, then compare the results of that repair with the emulated repair.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...