Data loss after Drive error

riZnich · March 24, 2021

Odd problem with my Unraid box. While I was on vacation last week I got the email below.

--

Event: Unraid array errors
Subject: Warning [MAXIMUS] - array has errors
Description: Array has 1 disk with read errors
Importance: warning

Disk 3 - WDC_WD100EMAZ-00WJTA0_2YJDWGRD (sdb) (errors 1024)

--

When I got home I noticed most of my Plex server said it didn't have access to the media. I saw the error in the dashboard so I decided to reboot the server. After the reboot everything checked out as healthy - I even got a second email telling me the drive had returned to normal. However, the drive while marked healthy now will not mount. The array still starts but I am missing all the data that was on that drive. I pulled the drive and I can't get it to mount in Windows, Mac or Linux. I figured when I got the errors that I would have to replace the disk but it is acting like the data on that drive was removed and not lost due to a drive failure. Has anyone had a problem like this. I powered down my Unraid box. I have spent so much time ripping my DVD and Blu-ray collection into Plex I am going to be supper frustrated if I have to start that all over again. This is the whole reason I went with redundancy and not just a drive off the shelf.

Rick

trurl · March 24, 2021

Would have been better if you had not removed the drive and asked for advice.

Start the array without the disk and

Go to Tools-Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread

riZnich · March 24, 2021

Yeah, I just figured since it wasn't mounting in the array - but your point is well taken. Here is the zip requested.

maximus-diagnostics-20210323-2048.zip

riZnich · March 24, 2021

Also, I have not altered the drive so I can put it back in and send again if that would be helpful

RM

John_M · March 24, 2021

For the time being keep the removed drive somewhere safe. Don't put it back in your server just yet and don't let other operating systems mess with it.

It was Disk 3. Since it's no longer present in your server its contents are being emulated. But there's also file system corruption so the emulated disk won't mount. First thing is to try to repair the emulated disk by running a file system check on it. So stop the array and re-start it in Maintenance mode, then click on the text "Disk 3" (on the Main page). Scroll down to Check File System Status and run the check. If you leave the "-n" in the box it will only read from the emulated disk and won't actually make any changes, so delete the "-n". You can replace it with "-v" to make the output more verbose. Wait for it to complete and post a screenshot of the result.

riZnich · March 25, 2021

John_M · March 25, 2021

Run it again, with the "-L" option. It won't cause any more corruption. It just deletes the journal.

riZnich · March 25, 2021

Phase 1 - find and verify superblock...

Phase 2 - using internal log

- zero log...

ALERT: The filesystem has valuable metadata changes in a log which is being

destroyed because the -L option was used.

- scan filesystem freespace and inode maps...

finobt ir_freecount/free mismatch, inode chunk 8/193538560, freecount 1 nfree 3

sb_fdblocks 76134471, counted 78574806

- found root inode chunk

Phase 3 - for each AG...

- scan and clear agi unlinked lists...

- process known inodes and perform inode discovery...

- agno = 0

- agno = 1

- agno = 2

- agno = 3

- agno = 4

- agno = 5

- agno = 6

- agno = 7

- agno = 8

Metadata corruption detected at 0x45bcd8, xfs_dir3_block block 0x1dd4a3b98/0x1000

corrupt block 0 in directory inode 8783473165

will junk block

no . entry for directory 8783473165

no .. entry for directory 8783473165

problem with directory contents in inode 8783473165

cleared inode 8783473165

data fork in ino 8783473170 claims free block 1097934190

data fork in ino 8783473170 claims free block 1097934191

data fork in ino 8783473191 claims free block 2320348258

correcting nblocks for inode 8783473191, was 876 - counted 941

data fork in ino 8783473192 claims free block 2344985649

- agno = 9

- agno = 10

- agno = 11

- agno = 12

- agno = 13

Bad atime nsec 1007960289 on inode 14212352420, resetting to zero

Bad mtime nsec 1003799651 on inode 14212352420, resetting to zero

Bad ctime nsec 1003799651 on inode 14212352420, resetting to zero

Bad crtime nsec 1007960289 on inode 14212352420, resetting to zero

data fork in ino 14212352420 claims free block 1912285981

- agno = 14

- agno = 15

- agno = 16

- agno = 17

data fork in ino 18341726330 claims free block 2351623635

- agno = 18

- agno = 19

- process newly discovered inodes...

Phase 4 - check for duplicate blocks...

- setting up duplicate extent list...

- check for inodes claiming duplicate blocks...

- agno = 0

- agno = 2

- agno = 1

- agno = 3

- agno = 4

- agno = 5

- agno = 6

- agno = 7

- agno = 8

- agno = 9

- agno = 10

- agno = 11

- agno = 12

entry "resilio-sync" in shortform directory 7121304162 references free inode 8783473165

junking entry "resilio-sync" in directory inode 7121304162

corrected i8 count in directory 7121304162, was 5, now 4

- agno = 13

- agno = 14

- agno = 15

- agno = 16

- agno = 17

- agno = 18

- agno = 19

Phase 5 - rebuild AG headers and trees...

- reset superblock...

Phase 6 - check inode connectivity...

- resetting contents of realtime bitmap and summary inodes

- traversing filesystem ...

- traversal finished ...

- moving disconnected inodes to lost+found ...

disconnected inode 8783473169, moving to lost+found

disconnected inode 8783473170, moving to lost+found

disconnected inode 8783473174, moving to lost+found

disconnected inode 8783473175, moving to lost+found

disconnected inode 8783473176, moving to lost+found

disconnected inode 8783473177, moving to lost+found

disconnected inode 8783473178, moving to lost+found

disconnected inode 8783473179, moving to lost+found

disconnected inode 8783473180, moving to lost+found

disconnected inode 8783473181, moving to lost+found

disconnected inode 8783473182, moving to lost+found

disconnected inode 8783473183, moving to lost+found

disconnected inode 8783473185, moving to lost+found

disconnected inode 8783473186, moving to lost+found

disconnected inode 8783473187, moving to lost+found

disconnected inode 8783473189, moving to lost+found

disconnected inode 8783473190, moving to lost+found

disconnected inode 8783473191, moving to lost+found

disconnected inode 8783473192, moving to lost+found

disconnected inode 8783473193, moving to lost+found

disconnected inode 8783473194, moving to lost+found

disconnected inode 8783473195, moving to lost+found

disconnected inode 8783473196, moving to lost+found

disconnected inode 8783473197, moving to lost+found

disconnected inode 8783473199, moving to lost+found

disconnected inode 8783474264, moving to lost+found

disconnected dir inode 9849547124, moving to lost+found

Phase 7 - verify and correct link counts...

resetting inode 155976774 nlinks from 2 to 3

resetting inode 7121304162 nlinks from 7 to 6

Maximum metadata LSN (5:342500) is ahead of log (1:2).

Format log to cycle 8.

done

John_M · March 25, 2021

Ok. Now you can stop the array and restart in normal mode and see if the emulated disk mounts.

trurl · March 25, 2021

Restart array not in maintenance mode and post new diagnostics.

riZnich · March 25, 2021

detached drive mounts - log attached

maximus-diagnostics-20210324-2235.zip

trurl · March 25, 2021

9 minutes ago, riZnich said:

moving to lost+found

You have a new User Share named lost+found. Take a look there to see how messy the repair was.

riZnich · March 25, 2021

Looks like I lost a folder from a docker resilio sync - but it didn't have any active content and then I have a handful of files that are super small

trurl · March 25, 2021

It is often the case that repair can't figure out what folders a file belongs in, or indeed, what the names of folders and files are.

Put the original disk back in your server and see if you can mount it using Unassigned Devices then post new diagnostics.

riZnich · March 25, 2021

no, the drive will not mount

trurl · March 25, 2021

You have repaired the emulated drive. Do you have another disk you can rebuild to? That will allow you to have another chance at repairing the original disk to see if you get better results.

riZnich · March 25, 2021

I don't have one on hand. I would have to order a drive

trurl · March 25, 2021

You could try to repair the Unassigned Device and see if the results are any better. I think you would have to go to the command line for that.

56 minutes ago, trurl said:

post new diagnostics

riZnich · March 25, 2021

I looked through that lost and found folder - I don't see anything critical in there that I would miss - can I wipe the drive and add it back in and have the array rebuild?

John_M · March 25, 2021

1 hour ago, riZnich said:

can I wipe the drive and add it back in and have the array rebuild?

We can't answer that without seeing diagnostics with the disk back inside the server. It might be faulty. There were read errors mentioned in your first post.

trurl · March 25, 2021

8 hours ago, riZnich said:

can I wipe the drive and add it back in and have the array rebuild?

Questions like this make me worry. There is absolutely no point in wiping the drive since it would be completely overwritten during rebuild.

The reason it makes me worry is because many users think they need to format a disk before trying to use it. They have a vague (enough to be wrong) idea of what format does.

Format means "write an empty filesystem to this disk". That is what it has always meant in every operating system you have ever used. If you format a disk in the parity array, Unraid treats that write operation exactly as it does any other, by updating parity. So after the format, parity agrees your disk has an empty filesystem, and rebuild can only result in an empty filesystem.

As mentioned, it would be useful to see the diagnostics with the disk attached.

riZnich · March 26, 2021

I think you misunderstood my intention. Firstly, the drive is not in the party array any more - it is in unassigned devices. I was going to format the drive so that it would mount in unassigned devices because in its current state it won't mount. Once mountable I could run preclear on the drive which would give a lot of data on the health of the drive. I have already ordered a new drive to put in the array so I am trying to assess if this drive failed because of a system corruption error or because it has physical problems. It if is physically sound I would do a drive swap on one of my smaller drives at a later date. If it isn't sound I will shelf it. I understand that Unraid prepares / formats a drive when you add it to the array. I may not have as deep a systems understanding of how to fix the system, but I do understand how the parity portion of Unraid works so I agree - if you have a drive in the array and you format it you are deleting all the data and the party will see this and adjust its parity data accordingly.

riZnich · March 26, 2021

I did find that I can run an extended smart test on the drive unmounted so I am currently doing that.

JonathanM · March 26, 2021

12 minutes ago, riZnich said:

Once mountable I could run preclear on the drive which would give a lot of data on the health of the drive.

Preclear doesn't care if a drive is mountable. Mountable means there is a valid filesystem in place, preclear writes zeroes to the entire capacity, which will remove any filesystem anyway.

trurl · March 26, 2021

2 hours ago, riZnich said:

I was going to format the drive so that it would mount in unassigned devices because in its current state it won't mount.

The usual fix for unmountable filesystem, and what I had in mind for you with the unassigned device, is to try to repair it similar to what you did with the emulated disk, then compare the results of that repair with the emulated repair.

Data loss after Drive error

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation