riZnich Posted March 24, 2021 Share Posted March 24, 2021 Odd problem with my Unraid box. While I was on vacation last week I got the email below. -- Event: Unraid array errors Subject: Warning [MAXIMUS] - array has errors Description: Array has 1 disk with read errors Importance: warning Disk 3 - WDC_WD100EMAZ-00WJTA0_2YJDWGRD (sdb) (errors 1024) -- When I got home I noticed most of my Plex server said it didn't have access to the media. I saw the error in the dashboard so I decided to reboot the server. After the reboot everything checked out as healthy - I even got a second email telling me the drive had returned to normal. However, the drive while marked healthy now will not mount. The array still starts but I am missing all the data that was on that drive. I pulled the drive and I can't get it to mount in Windows, Mac or Linux. I figured when I got the errors that I would have to replace the disk but it is acting like the data on that drive was removed and not lost due to a drive failure. Has anyone had a problem like this. I powered down my Unraid box. I have spent so much time ripping my DVD and Blu-ray collection into Plex I am going to be supper frustrated if I have to start that all over again. This is the whole reason I went with redundancy and not just a drive off the shelf. Rick Quote Link to comment
trurl Posted March 24, 2021 Share Posted March 24, 2021 Would have been better if you had not removed the drive and asked for advice. Start the array without the disk and Go to Tools-Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread Quote Link to comment
riZnich Posted March 24, 2021 Author Share Posted March 24, 2021 Yeah, I just figured since it wasn't mounting in the array - but your point is well taken. Here is the zip requested. maximus-diagnostics-20210323-2048.zip Quote Link to comment
riZnich Posted March 24, 2021 Author Share Posted March 24, 2021 Also, I have not altered the drive so I can put it back in and send again if that would be helpful RM Quote Link to comment
John_M Posted March 24, 2021 Share Posted March 24, 2021 For the time being keep the removed drive somewhere safe. Don't put it back in your server just yet and don't let other operating systems mess with it. It was Disk 3. Since it's no longer present in your server its contents are being emulated. But there's also file system corruption so the emulated disk won't mount. First thing is to try to repair the emulated disk by running a file system check on it. So stop the array and re-start it in Maintenance mode, then click on the text "Disk 3" (on the Main page). Scroll down to Check File System Status and run the check. If you leave the "-n" in the box it will only read from the emulated disk and won't actually make any changes, so delete the "-n". You can replace it with "-v" to make the output more verbose. Wait for it to complete and post a screenshot of the result. Quote Link to comment
John_M Posted March 25, 2021 Share Posted March 25, 2021 Run it again, with the "-L" option. It won't cause any more corruption. It just deletes the journal. Quote Link to comment
riZnich Posted March 25, 2021 Author Share Posted March 25, 2021 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... finobt ir_freecount/free mismatch, inode chunk 8/193538560, freecount 1 nfree 3 sb_fdblocks 76134471, counted 78574806 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 Metadata corruption detected at 0x45bcd8, xfs_dir3_block block 0x1dd4a3b98/0x1000 corrupt block 0 in directory inode 8783473165 will junk block no . entry for directory 8783473165 no .. entry for directory 8783473165 problem with directory contents in inode 8783473165 cleared inode 8783473165 data fork in ino 8783473170 claims free block 1097934190 data fork in ino 8783473170 claims free block 1097934191 data fork in ino 8783473191 claims free block 2320348258 correcting nblocks for inode 8783473191, was 876 - counted 941 data fork in ino 8783473192 claims free block 2344985649 - agno = 9 - agno = 10 - agno = 11 - agno = 12 - agno = 13 Bad atime nsec 1007960289 on inode 14212352420, resetting to zero Bad mtime nsec 1003799651 on inode 14212352420, resetting to zero Bad ctime nsec 1003799651 on inode 14212352420, resetting to zero Bad crtime nsec 1007960289 on inode 14212352420, resetting to zero data fork in ino 14212352420 claims free block 1912285981 - agno = 14 - agno = 15 - agno = 16 - agno = 17 data fork in ino 18341726330 claims free block 2351623635 - agno = 18 - agno = 19 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 2 - agno = 1 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - agno = 10 - agno = 11 - agno = 12 entry "resilio-sync" in shortform directory 7121304162 references free inode 8783473165 junking entry "resilio-sync" in directory inode 7121304162 corrected i8 count in directory 7121304162, was 5, now 4 - agno = 13 - agno = 14 - agno = 15 - agno = 16 - agno = 17 - agno = 18 - agno = 19 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 8783473169, moving to lost+found disconnected inode 8783473170, moving to lost+found disconnected inode 8783473174, moving to lost+found disconnected inode 8783473175, moving to lost+found disconnected inode 8783473176, moving to lost+found disconnected inode 8783473177, moving to lost+found disconnected inode 8783473178, moving to lost+found disconnected inode 8783473179, moving to lost+found disconnected inode 8783473180, moving to lost+found disconnected inode 8783473181, moving to lost+found disconnected inode 8783473182, moving to lost+found disconnected inode 8783473183, moving to lost+found disconnected inode 8783473185, moving to lost+found disconnected inode 8783473186, moving to lost+found disconnected inode 8783473187, moving to lost+found disconnected inode 8783473189, moving to lost+found disconnected inode 8783473190, moving to lost+found disconnected inode 8783473191, moving to lost+found disconnected inode 8783473192, moving to lost+found disconnected inode 8783473193, moving to lost+found disconnected inode 8783473194, moving to lost+found disconnected inode 8783473195, moving to lost+found disconnected inode 8783473196, moving to lost+found disconnected inode 8783473197, moving to lost+found disconnected inode 8783473199, moving to lost+found disconnected inode 8783474264, moving to lost+found disconnected dir inode 9849547124, moving to lost+found Phase 7 - verify and correct link counts... resetting inode 155976774 nlinks from 2 to 3 resetting inode 7121304162 nlinks from 7 to 6 Maximum metadata LSN (5:342500) is ahead of log (1:2). Format log to cycle 8. done Quote Link to comment
John_M Posted March 25, 2021 Share Posted March 25, 2021 Ok. Now you can stop the array and restart in normal mode and see if the emulated disk mounts. Quote Link to comment
trurl Posted March 25, 2021 Share Posted March 25, 2021 Restart array not in maintenance mode and post new diagnostics. Quote Link to comment
riZnich Posted March 25, 2021 Author Share Posted March 25, 2021 detached drive mounts - log attached maximus-diagnostics-20210324-2235.zip Quote Link to comment
trurl Posted March 25, 2021 Share Posted March 25, 2021 9 minutes ago, riZnich said: moving to lost+found You have a new User Share named lost+found. Take a look there to see how messy the repair was. Quote Link to comment
riZnich Posted March 25, 2021 Author Share Posted March 25, 2021 Looks like I lost a folder from a docker resilio sync - but it didn't have any active content and then I have a handful of files that are super small Quote Link to comment
trurl Posted March 25, 2021 Share Posted March 25, 2021 It is often the case that repair can't figure out what folders a file belongs in, or indeed, what the names of folders and files are. Put the original disk back in your server and see if you can mount it using Unassigned Devices then post new diagnostics. Quote Link to comment
riZnich Posted March 25, 2021 Author Share Posted March 25, 2021 no, the drive will not mount Quote Link to comment
trurl Posted March 25, 2021 Share Posted March 25, 2021 You have repaired the emulated drive. Do you have another disk you can rebuild to? That will allow you to have another chance at repairing the original disk to see if you get better results. Quote Link to comment
riZnich Posted March 25, 2021 Author Share Posted March 25, 2021 I don't have one on hand. I would have to order a drive Quote Link to comment
trurl Posted March 25, 2021 Share Posted March 25, 2021 You could try to repair the Unassigned Device and see if the results are any better. I think you would have to go to the command line for that. 56 minutes ago, trurl said: post new diagnostics Quote Link to comment
riZnich Posted March 25, 2021 Author Share Posted March 25, 2021 I looked through that lost and found folder - I don't see anything critical in there that I would miss - can I wipe the drive and add it back in and have the array rebuild? Quote Link to comment
John_M Posted March 25, 2021 Share Posted March 25, 2021 1 hour ago, riZnich said: can I wipe the drive and add it back in and have the array rebuild? We can't answer that without seeing diagnostics with the disk back inside the server. It might be faulty. There were read errors mentioned in your first post. Quote Link to comment
trurl Posted March 25, 2021 Share Posted March 25, 2021 8 hours ago, riZnich said: can I wipe the drive and add it back in and have the array rebuild? Questions like this make me worry. There is absolutely no point in wiping the drive since it would be completely overwritten during rebuild. The reason it makes me worry is because many users think they need to format a disk before trying to use it. They have a vague (enough to be wrong) idea of what format does. Format means "write an empty filesystem to this disk". That is what it has always meant in every operating system you have ever used. If you format a disk in the parity array, Unraid treats that write operation exactly as it does any other, by updating parity. So after the format, parity agrees your disk has an empty filesystem, and rebuild can only result in an empty filesystem. As mentioned, it would be useful to see the diagnostics with the disk attached. Quote Link to comment
riZnich Posted March 26, 2021 Author Share Posted March 26, 2021 I think you misunderstood my intention. Firstly, the drive is not in the party array any more - it is in unassigned devices. I was going to format the drive so that it would mount in unassigned devices because in its current state it won't mount. Once mountable I could run preclear on the drive which would give a lot of data on the health of the drive. I have already ordered a new drive to put in the array so I am trying to assess if this drive failed because of a system corruption error or because it has physical problems. It if is physically sound I would do a drive swap on one of my smaller drives at a later date. If it isn't sound I will shelf it. I understand that Unraid prepares / formats a drive when you add it to the array. I may not have as deep a systems understanding of how to fix the system, but I do understand how the parity portion of Unraid works so I agree - if you have a drive in the array and you format it you are deleting all the data and the party will see this and adjust its parity data accordingly. Quote Link to comment
riZnich Posted March 26, 2021 Author Share Posted March 26, 2021 I did find that I can run an extended smart test on the drive unmounted so I am currently doing that. Quote Link to comment
JonathanM Posted March 26, 2021 Share Posted March 26, 2021 12 minutes ago, riZnich said: Once mountable I could run preclear on the drive which would give a lot of data on the health of the drive. Preclear doesn't care if a drive is mountable. Mountable means there is a valid filesystem in place, preclear writes zeroes to the entire capacity, which will remove any filesystem anyway. Quote Link to comment
trurl Posted March 26, 2021 Share Posted March 26, 2021 2 hours ago, riZnich said: I was going to format the drive so that it would mount in unassigned devices because in its current state it won't mount. The usual fix for unmountable filesystem, and what I had in mind for you with the unassigned device, is to try to repair it similar to what you did with the emulated disk, then compare the results of that repair with the emulated repair. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.