DarkKnight Posted December 8, 2018 Share Posted December 8, 2018 I turned off my unraid server via the GUI a couple times this week, and when restarting it yesterday it came back up with two unmountable disks with 'Corruption warning: Metadata has LSN (1:83814) ahead of current LSN (1:80338).' I restarted the array in maintenance mode and ran xfs_repair -v for both devices which indicated -L was needed. I reran it with -L and the output looked good: Phase 1 - find and verify superblock... - block cache size set to 2292464 entries Phase 2 - using internal log - zero log... zero_log: head block 451270 tail block 451266 ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 Phase 5 - rebuild AG headers and trees... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (1:452526) is ahead of log (1:2). Format log to cycle 4. XFS_REPAIR Summary Sat Dec 8 08:49:37 2018 Phase Start End Duration Phase 1: 12/08 08:44:23 12/08 08:44:23 Phase 2: 12/08 08:44:23 12/08 08:45:44 1 minute, 21 seconds Phase 3: 12/08 08:45:44 12/08 08:45:45 1 second Phase 4: 12/08 08:45:45 12/08 08:45:45 Phase 5: 12/08 08:45:45 12/08 08:45:45 Phase 6: 12/08 08:45:45 12/08 08:45:45 Phase 7: 12/08 08:45:45 12/08 08:45:45 Total run time: 1 minute, 22 seconds done xfs_repair -v -L /dev/md15 Phase 1 - find and verify superblock... - block cache size set to 2292464 entries Phase 2 - using internal log - zero log... zero_log: head block 80338 tail block 80334 ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 Phase 5 - rebuild AG headers and trees... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (1:83814) is ahead of log (1:2). Format log to cycle 4. XFS_REPAIR Summary Sat Dec 8 08:50:28 2018 Phase Start End Duration Phase 1: 12/08 08:45:19 12/08 08:45:19 Phase 2: 12/08 08:45:19 12/08 08:47:15 1 minute, 56 seconds Phase 3: 12/08 08:47:15 12/08 08:47:15 Phase 4: 12/08 08:47:15 12/08 08:47:15 Phase 5: 12/08 08:47:15 12/08 08:47:15 Phase 6: 12/08 08:47:15 12/08 08:47:15 Phase 7: 12/08 08:47:15 12/08 08:47:15 Total run time: 1 minute, 56 seconds done I restarted the array and it detected the disks normally and everything 'looks' okay. Now I need to run a consistency check, but I'd like the check to consider the parity authoritative rather than the data disks in case there are differences. How can I do this? diagnostics-20181208-0926.zip Quote Link to comment
JonathanM Posted December 8, 2018 Share Posted December 8, 2018 2 minutes ago, DarkKnight said: Now I need to run a consistency check, but I'd like the check to consider the parity authoritative rather than the data disks in case there are differences. How can I do this? You can't. All you can do is a non-correcting check and see if parity is consistent. I understand what you are getting at, but if you think it through, it's not possible. The parity disk by itself has no way of recording which member of the parity set is the wrong bit, only that ONE of the several data disks is inconsistent. Theoretically you could examine each non matching address offset and flip a bit on each drive one by one and see which of the solutions made the most logical sense, but you would have to determine which file was effected for each drive and check for corruption with external validation, or if that address was in unused space, in which case you wouldn't be able to tell what was correct or incorrect. The best you can do is a non-correcting check, if there are errors you would have to do a byte level comparison with backups or checksum to verify which file if any were affected. tldr; Parity is a sum of all disks, so if the array consists of more than one data disk there is no way to tell which data disk is wrong. 1 Quote Link to comment
DarkKnight Posted December 8, 2018 Author Share Posted December 8, 2018 The server is at about 30/50TBs. used. There's no other backup. Unraid is capable of emulating the disks when they are missing using parity, provided enough other disks are available. If it can do that, why can't we choose to have the data corrected rather than the parity? Quote Link to comment
SkippyAlpha Posted December 8, 2018 Share Posted December 8, 2018 If you want to just rebuild a disk, simply stop the array>unassign the trouble disk>start array>stop array>reassign disk>start array and start rebuilding. Quote Link to comment
DarkKnight Posted December 8, 2018 Author Share Posted December 8, 2018 1 minute ago, Sven88 said: If you want to just rebuild a disk, simply stop the array>unassign the trouble disk>start array>stop array>reassign disk>start array and start rebuilding. I was down two disks. I did not want to take the chance of a problem occurring during rebuild that would lose all of that data. I don't have 4TB of space available outside the array for backup of the emulated contents either. Quote Link to comment
JonathanM Posted December 8, 2018 Share Posted December 8, 2018 1 hour ago, DarkKnight said: Unraid is capable of emulating the disks when they are missing using parity, provided enough other disks are available. If it can do that, why can't we choose to have the data corrected rather than the parity? Which disk do you want it to correct? Quote Link to comment
DarkKnight Posted December 8, 2018 Author Share Posted December 8, 2018 (edited) md4 & md15 both had log errors. Edit: I believe it was related to an unclean shutdown due to too short of a default timer on the disks. I set it to 7 min per the recommendation today. Edited December 8, 2018 by DarkKnight Quote Link to comment
JonathanM Posted December 8, 2018 Share Posted December 8, 2018 1 minute ago, DarkKnight said: md4 & md15 both had log errors. So which one is in error? My point is, parity can't tell which one is wrong, only that one (or more) of the array members is inconsistent with what currently is on the parity disk at that address. Quote Link to comment
trurl Posted December 8, 2018 Share Posted December 8, 2018 1 hour ago, DarkKnight said: The server is at about 30/50TBs. used. There's no other backup. Unraid is capable of emulating the disks when they are missing using parity, provided enough other disks are available. If it can do that, why can't we choose to have the data corrected rather than the parity? If you have single parity, then the number of disks required to emulate a single missing disk is ALL of the other disks. If you had dual parity then you would be able to rebuild both of those data disks, but it is extremely unlikely to have fixed anything. Filesystem corruption needs to be repaired in the way you already did it. And you really must have backups. You don't have to backup everything, but you need a plan. You must have another copy of anything important and irreplaceable on another system. Parity will not save you. Quote Link to comment
DarkKnight Posted December 8, 2018 Author Share Posted December 8, 2018 I have dual parity. My concern was the warning message that data corruption could get worse due to using -L in the repair. If this is not the case in this instance, than I have nothing to worry about. I'm running a non-corrective parity check. I also noticed that after 18 consecutive months of error free checks, I got 394 errors on my last monthly check. No new smart warnings, but I did have to shut unraid off a couple times in the past month while I was doing work on my servers. I suppose I could have had an unclean shutdown then. In terms of backups of *really* important data like photos, I do have those on multiple machines. I don't have an off-site backup configured for older photos, but it's on the list. B2 looking pretty cheap for that. Newer photos are covered by iCloud. I have a few TB in project files for old VHS home movies that I'd be pretty pissed to lose, but uncompressed they are like 30GB/hr and I have something like 100-200hrs of footage, though not all of it has been digitized yet. I can't imagine what my ISP would do if I tried to pass 10+TB of upload in a month over top of my already high usage to cover backing up that plus my existing irreproducible data. Would raise a red flag I don't want. Besides, I don't wan't the monthly sub. Would make more sense to me to get a 10-14TB drive, back it up locally and store that off-site. Just not in the budget. 😒 Quote Link to comment
JorgeB Posted December 8, 2018 Share Posted December 8, 2018 Just now, DarkKnight said: I got 394 errors on my last monthly check. No new smart warnings, but I did have to shut unraid off a couple times in the past month while I was doing work on my servers. I suppose I could have had an unclean shutdown then. Most likely Consider creating checksums for your files, very handy for situations like these. Quote Link to comment
Garbonzo Posted September 26, 2023 Share Posted September 26, 2023 On 12/8/2018 at 11:50 AM, JorgeB said: Most likely Consider creating checksums for your files, very handy for situations like these. I realize this is like 5 yrs old, but considering the drive issues I am dealing with (you are helping me currently actually) I thought something like this might be a good idea. I was wondering if there was a specific tool for unraid that you recommend for creating and reconciling the checksums. TIA Quote Link to comment
JorgeB Posted September 26, 2023 Share Posted September 26, 2023 Personally I would use btrfs or zfs, since they automatically checksum all data, for xfs you can use the File Integrity plugin, or an external tool like corz. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.