boosted Posted November 23, 2020 Share Posted November 23, 2020 I have a 8 disk array with 6 data and 2 parity. Haven't really paid much attention to the GUI then I noticed some slowness and strange behavior from the array so I went to the GUI. It shows disk 3 and disk 5 with red X and Unmountable: No file system error. I also saw disk 6 if I remember correctly as being mounted. It seemed strange that if 3 out of 6 disks are out of the array that it can still emulate the data on 3 and 5? The array is not very full, so disk 6 might not have anything. I clicked "mount" for disk 6 then stop and started the array. It now shows disk 6 in the array as normal. What should I do about disk 3 and 5? This rig is barely 2 years old, seems odd to be losing 2 drives, can any diagnostics be run to see what happened? it doesn't seem to show smarts error. I noticed that files I copied into the array are not there any more via the NFS share. Did I lose any data? If disk 3 and 5 are emulated, shouldn't all the data be there still? Attached is the error log. unraid-syslog-20201123-1300.zip Quote Link to comment
trurl Posted November 23, 2020 Share Posted November 23, 2020 7 minutes ago, boosted said: Haven't really paid much attention to the GUI then I noticed You must setup Notifications to alert you immediately by email or other agent as soon as a problem is detected. Don't let one problem become multiple problems (as it seems you may have) and data loss. Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread. Quote Link to comment
trurl Posted November 23, 2020 Share Posted November 23, 2020 11 minutes ago, boosted said: I clicked "mount" for disk 6 Not clear what you mean here since there is no "mount" to click for a disk in the parity array. Other things unclear from your post too but diagnostics may clear some of that up. Quote Link to comment
boosted Posted November 23, 2020 Author Share Posted November 23, 2020 yea I need to get notification set up. here's the diagnostic log. Thank you for taking a look for me. unraid-diagnostics-20201123-1322.zip Quote Link to comment
boosted Posted November 23, 2020 Author Share Posted November 23, 2020 1 minute ago, trurl said: Not clear what you mean here since there is no "mount" to click for a disk in the parity array. I made a typo. disk 6 showed up in the unassigned devices section. so I clicked "mount" for it. At that time, disk 3 and 5 were showing contents emulate. Quote Link to comment
trurl Posted November 23, 2020 Share Posted November 23, 2020 You are on a very old (nearly 3 years) version of Unraid, so unfortunately, the diagnostics you gave us won't tell us as much as newer versions. Quote Link to comment
boosted Posted November 23, 2020 Author Share Posted November 23, 2020 2 minutes ago, trurl said: You are on a very old (nearly 3 years) version of Unraid, so unfortunately, the diagnostics you gave us won't tell us as much as newer versions. hopefully it still tells something. I'm always hesitant to upgrade. Quote Link to comment
trurl Posted November 23, 2020 Share Posted November 23, 2020 Disks 3 and 5 don't appear to be connected. Shut down, check all connections, power and SATA, including any splitters. Reboot and post new diagnostics. Quote Link to comment
trurl Posted November 23, 2020 Share Posted November 23, 2020 1 minute ago, boosted said: I'm always hesitant to upgrade. Very difficult for us to help with a version we haven't seen in years. Quote Link to comment
boosted Posted November 23, 2020 Author Share Posted November 23, 2020 (edited) 1 minute ago, trurl said: Disks 3 and 5 don't appear to be connected. Shut down, check all connections, power and SATA, including any splitters. Reboot and post new diagnostics. I will do that. Hopefully just bad cabling that came loose. Edited November 23, 2020 by boosted Quote Link to comment
trurl Posted November 23, 2020 Share Posted November 23, 2020 Since you have dual parity, it is able to emulate both of the missing disks, but unfortunately the emulated disks are unmountable. Be sure to check connections on ALL disks since ALL disks are needed to accurately emulate the disabled disks. Quote Link to comment
boosted Posted November 23, 2020 Author Share Posted November 23, 2020 16 minutes ago, trurl said: Since you have dual parity, it is able to emulate both of the missing disks, but unfortunately the emulated disks are unmountable. Be sure to check connections on ALL disks since ALL disks are needed to accurately emulate the disabled disks. I understand that it can emulate 2 disks since I have 2 parities. But when disk 6 was in the unassigned, it also said emulating. I wonder how that happened or how that works. I opened up the system and checked the connections, they look fine. I reseated the sata and power on both ends. Here's the diagnostic. unraid-diagnostics-20201123-1349.zip Quote Link to comment
trurl Posted November 23, 2020 Share Posted November 23, 2020 Some things we can't tell at all from the diagnostics on that old versions, and other things we can tell if we work harder at it. For example, I have to open up multiple folders and files just to see which disks are disabled and then be able to compare them to the SMART reports for those disks. Disabled and emulated disks 3 and 5 still not mounted but the physical disks are connected now. Disks 3 and 5 SMART attributes look OK but neither have had any self tests run on them yet. The best way to proceed would be to try to repair the emulated filesystems but first answer these 2 questions: Do you have any spare disks of the same size or larger (but no larger than either parity)? Do you have backups of anything important and irreplaceable? Quote Link to comment
boosted Posted November 23, 2020 Author Share Posted November 23, 2020 2 minutes ago, trurl said: Some things we can't tell at all from the diagnostics on that old versions, and other things we can tell if we work harder at it. For example, I have to open up multiple folders and files just to see which disks are disabled and then be able to compare them to the SMART reports for those disks. Disabled and emulated disks 3 and 5 still not mounted but the physical disks are connected now. Disks 3 and 5 SMART attributes look OK but neither have had any self tests run on them yet. The best way to proceed would be to try to repair the emulated filesystems but first answer these 2 questions: Do you have any spare disks of the same size or larger (but no larger than either parity)? Do you have backups of anything important and irreplaceable? Is it wise to upgrade to latest version of OS right now while in this degraded state for better diagnostics? I do not have a spare drive at the moment. No backups of the entire array, but if I lose what's on disk 3, it might be ok. From what I can tell, only things from recent 1 month I added were lost it seems. That tells me that with the high water setting may have just started writing to disk 3 recently after disk 1 and 2 were half full, so whatever I added recently may have been lost in disk 3, but I still have those data I believe. We're not talking about losing the whole array right? I made a huge copy of files yesterday, with multiple(6) copy streams at the same time. That's when the issue started. It doesn't make sense that would kill a drive though. Quote Link to comment
trurl Posted November 23, 2020 Share Posted November 23, 2020 5 minutes ago, boosted said: No backups of the entire array I don't either. But I have multiple offsite copies of anything important and irreplaceable. And I have a backup Unraid server for some of the less important things just because I had some hardware leftover after upgrading my main server. Even dual parity is not a substitute for a backup plan. 7 minutes ago, boosted said: We're not talking about losing the whole array right? All the mounted disks should be OK, and maybe we can fix the others. 8 minutes ago, boosted said: I do not have a spare drive at the moment. The reason I ask is because it might be useful to keep the original disks unchanged in any way. It is even possible that the original disks are in fact mountable, but for some reason the emulated disks are not. In any case, we are going to start with checking the emulated filesystems of the disabled disks. Study this and ask if you have any questions: https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui Quote Link to comment
trurl Posted November 23, 2020 Share Posted November 23, 2020 15 minutes ago, boosted said: Is it wise to upgrade to latest version of OS right now while in this degraded state for better diagnostics? Depending on how the filesystem checks and repairs seem to be going it might be necessary since there have been some important updates regarding that. Quote Link to comment
boosted Posted November 23, 2020 Author Share Posted November 23, 2020 21 minutes ago, trurl said: I don't either. But I have multiple offsite copies of anything important and irreplaceable. And I have a backup Unraid server for some of the less important things just because I had some hardware leftover after upgrading my main server. Even dual parity is not a substitute for a backup plan. All the mounted disks should be OK, and maybe we can fix the others. The reason I ask is because it might be useful to keep the original disks unchanged in any way. It is even possible that the original disks are in fact mountable, but for some reason the emulated disks are not. In any case, we are going to start with checking the emulated filesystems of the disabled disks. Study this and ask if you have any questions: https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui I understand that parity is no substitute for backups. But I have 2 other identical synology diskstations set up backing each other up already. Plus a APC rack mount UPS to keep the power stable. With this 3rd array, the funds are just not there lol. But the diskstations are the absolute irreplaceables. The unRaid data are more or less replaceable. I'd be really sad if some aren't recoverable, but it won't affect my life, so that's the choice I made. Although I have been too lazy on the disk checks on the unRaid. Let me read through the check wiki and get back to you. Thank you for your continued assistance. Apologies for the ancient OS version for making it difficult to match up the logs. Quote Link to comment
boosted Posted November 24, 2020 Author Share Posted November 24, 2020 Had to finish up some stuff. Here are the results. I put it in maintenance mode, added verbose to options to make it -nv and ran the test for both drives. drive 3 took a while, and I clicked refresh to get the result. disk 5 took no time at all, almost as if it didn't run? disk3 Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... would write modified primary superblock Primary superblock would have been modified. Cannot proceed further in no_modify mode. Exiting now. disk5 Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... would write modified primary superblock Primary superblock would have been modified. Cannot proceed further in no_modify mode. Exiting now. Quote Link to comment
trurl Posted November 24, 2020 Share Posted November 24, 2020 Go ahead without the -n Quote Link to comment
boosted Posted November 24, 2020 Author Share Posted November 24, 2020 here's disk 3 with -v Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... writing modified primary superblock - block cache size set to 120736 entries sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 96 resetting superblock root inode pointer to 96 sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 97 resetting superblock realtime bitmap ino pointer to 97 sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 98 resetting superblock realtime summary ino pointer to 98 Phase 2 - using internal log - zero log... zero_log: head block 487811 tail block 487807 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Quote Link to comment
boosted Posted November 24, 2020 Author Share Posted November 24, 2020 disk 5 is much different Phase 1 - find and verify superblock... bad primary superblock - bad CRC in superblock !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... writing modified primary superblock - block cache size set to 120736 entries sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 96 resetting superblock root inode pointer to 96 sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 97 resetting superblock realtime bitmap ino pointer to 97 sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 98 resetting superblock realtime summary ino pointer to 98 Phase 2 - using internal log - zero log... zero_log: head block 163 tail block 163 - scan filesystem freespace and inode maps... sb_icount 0, counted 64 sb_ifree 0, counted 60 sb_fdblocks 1952984865, counted 1952984857 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 Phase 5 - rebuild AG headers and trees... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Note - stripe unit (0) and width (0) were copied from a backup superblock. Please reset with mount -o sunit=,swidth= if necessary XFS_REPAIR Summary Mon Nov 23 18:00:47 2020 Phase Start End Duration Phase 1: 11/23 18:00:47 11/23 18:00:47 Phase 2: 11/23 18:00:47 11/23 18:00:47 Phase 3: 11/23 18:00:47 11/23 18:00:47 Phase 4: 11/23 18:00:47 11/23 18:00:47 Phase 5: 11/23 18:00:47 11/23 18:00:47 Phase 6: 11/23 18:00:47 11/23 18:00:47 Phase 7: 11/23 18:00:47 11/23 18:00:47 Total run time: done Quote Link to comment
trurl Posted November 24, 2020 Share Posted November 24, 2020 You will have to use the -L on disk3. That is just the way the linux xfs repair tool works. It is giving you a chance to mount the disk and replay the transaction log, but Unraid has already determined the disk is unmountable so nothing to do but make it forget about that transaction log and proceed. Is disk5 mounted now? Quote Link to comment
boosted Posted November 24, 2020 Author Share Posted November 24, 2020 2 minutes ago, trurl said: You will have to use the -L on disk3. That is just the way the linux xfs repair tool works. It is giving you a chance to mount the disk and replay the transaction log, but Unraid has already determined the disk is unmountable so nothing to do but make it forget about that transaction log and proceed. Is disk5 mounted now? I'm still in maintenance mode. do I take it out of maintenance mode to see if disk5 is mountable? Currently it still says both disk 3 and 5 are not mountable in maintenance mode. Quote Link to comment
trurl Posted November 24, 2020 Share Posted November 24, 2020 Go ahead and do disk3 Quote Link to comment
boosted Posted November 24, 2020 Author Share Posted November 24, 2020 (edited) ok I ran -vL on disk 3. Phase 1 - find and verify superblock... - block cache size set to 120736 entries sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 96 resetting superblock root inode pointer to 96 sb realtime bitmap inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 97 resetting superblock realtime bitmap ino pointer to 97 sb realtime summary inode 18446744073709551615 (NULLFSINO) inconsistent with calculated value 98 resetting superblock realtime summary ino pointer to 98 Phase 2 - using internal log - zero log... zero_log: head block 487811 tail block 487807 ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... sb_icount 0, counted 18112 sb_ifree 0, counted 334 sb_fdblocks 1952984865, counted 1064388454 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 1 - agno = 0 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 Phase 5 - rebuild AG headers and trees... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... Maximum metadata LSN (1:487789) is ahead of log (1:2). Format log to cycle 4. XFS_REPAIR Summary Mon Nov 23 19:10:33 2020 Phase Start End Duration Phase 1: 11/23 19:07:53 11/23 19:07:53 Phase 2: 11/23 19:07:53 11/23 19:08:42 49 seconds Phase 3: 11/23 19:08:42 11/23 19:08:44 2 seconds Phase 4: 11/23 19:08:44 11/23 19:08:44 Phase 5: 11/23 19:08:44 11/23 19:08:44 Phase 6: 11/23 19:08:44 11/23 19:08:45 1 second Phase 7: 11/23 19:08:45 11/23 19:08:45 Total run time: 52 seconds done still says unmountable. I ran disk3 with -n again just to see if there's more repairs needed. maybe there are? Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. Edited November 24, 2020 by boosted Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.