Jump to content

openam

Members
  • Posts

    81
  • Joined

  • Last visited

Everything posted by openam

  1. I just ran `kill 31637` and it let the array stop. Going to continue on...
  2. It's having a hard time stopping the array. I'm seeing the following in the syslog Oct 19 12:57:01 palazzo emhttpd: shcmd (8162): umount /mnt/disk5 Oct 19 12:57:01 palazzo root: umount: /mnt/disk5: target is busy. Oct 19 12:57:01 palazzo emhttpd: shcmd (8162): exit status: 32 Oct 19 12:57:01 palazzo emhttpd: Retry unmounting disk share(s)... Oct 19 12:57:06 palazzo emhttpd: Unmounting disks... Oct 19 12:57:06 palazzo emhttpd: shcmd (8163): umount /mnt/disk5 Oct 19 12:57:06 palazzo root: umount: /mnt/disk5: target is busy. Oct 19 12:57:06 palazzo emhttpd: shcmd (8163): exit status: 32 Oct 19 12:57:06 palazzo emhttpd: Retry unmounting disk share(s)... Looking to see if I could figure out what was causing it shows this root@palazzo:/mnt# fuser -vc /mnt/disk5/ USER PID ACCESS COMMAND /mnt/disk5: root kernel mount /mnt/disk5 root 31637 ..c.. bash root@palazzo:/mnt# Should I just kill the 31637 PID? I think I was ssh'd in there looking at it when I started the shutdown, but cd'ing out and disconnecting ssh session didn't resolve it.
  3. It looks like it finished with the xfs_repair. Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... agf_freeblks 257216, counted 257217 in ag 0 agi_freecount 95, counted 98 in ag 0 finobt sb_ifree 684, counted 778 sb_fdblocks 899221751, counted 901244387 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 imap claims a free inode 161821052 is in use, correcting imap and clearing inode cleared inode 161821052 imap claims a free inode 161821053 is in use, correcting imap and clearing inode cleared inode 161821053 imap claims a free inode 161821054 is in use, correcting imap and clearing inode cleared inode 161821054 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 4 - agno = 6 - agno = 2 - agno = 5 - agno = 1 - agno = 7 - agno = 3 entry "The.redacted.filename" at block 11 offset 3336 in directory inode 4316428006 references free inode 161821052 clearing inode number in entry at offset 3336... Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... rebuilding directory inode 4316428006 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify and correct link counts... resetting inode 4316428006 nlinks from 644 to 643 Maximum metadata LSN (1:1078445) is ahead of log (1:2). Format log to cycle 4. done I started the array, and it looks like things are there. I don't see a `lost+found` folder looking through the unRAID ui, nor do I see it while ssh'd in, and running `ls -al /mnt/disk5`. Is there some some other place the lost+found folder is supposed to be located? The logs indicated "moving disconnected inodes to lost+found" but I can't see anything like that. Shall I just continue with steps 3-8 in my earlier post?
  4. Just tried running xfs_repair with out the `-n` flag and got the following error Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. How do I go about mounting the drive? Should I start the array not in maintenance mode? Looks like it's sdj, should I just ssh in, and run `mount /dev/sdj /mnt`, actually it looks like they are normally mounted to /mnt/disks, what would the command be, and then what's the command to unmount, and when do I do that? Does it just need to be mounted for a short time?
  5. Thanks for your reply. I re-seated the HBA card, and all the sata connectors when I rebooted it initially, so that the disks would come back up, and I was able to get the SMART reports. My mobo only has one PCIe slot, so I just re-seated it. Maybe I should double check disk 5 connections, it might be the one disk that I didn't re-seat. It requires me to pull the case apart. Is there anything I should do with that disk 5 with the `Inode allocation btrees are too corrupted` error, or does rebuilding it on top correct all those issues? Sounds like next steps are:? Restart the array and let it emulate disk 5 Check the contents of disk 5 to make sure they look alright Stop the array Remove disk 5 from the array Restart the array without disk 5 Stop the array Re-add disk 5 to the array Restart it to have it start rebuilding Steps 3-6 is how I originally had it rebuild disk 6, Is that the correct procedure? Was there anything else I should with xfs_repair before, after, or in the middle of those steps?
  6. I ended up running the check again with the `-nv` option, because the docs indicated that I should. The docs indicate that it should recommend an action: Here is the output of the `-nv` check, and it still doesn't appear to have a recommended command. Phase 1 - find and verify superblock... - block cache size set to 699552 entries Phase 2 - using internal log - zero log... zero_log: head block 1078458 tail block 1078430 ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... agf_freeblks 257216, counted 257217 in ag 0 agi_freecount 95, counted 98 in ag 0 finobt sb_ifree 684, counted 778 sb_fdblocks 899221751, counted 901244387 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 imap claims a free inode 161821052 is in use, would correct imap and clear inode imap claims a free inode 161821053 is in use, would correct imap and clear inode imap claims a free inode 161821054 is in use, would correct imap and clear inode - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 3 - agno = 7 - agno = 6 - agno = 4 - agno = 2 - agno = 5 - agno = 1 entry "The.redacted.filename" at block 11 offset 3336 in directory inode 4316428006 references free inode 161821052 would clear inode number in entry at offset 3336... No modify flag set, skipping phase 5 Inode allocation btrees are too corrupted, skipping phases 6 and 7 No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Wed Oct 18 22:28:36 2023 Phase Start End Duration Phase 1: 10/18 22:25:38 10/18 22:25:38 Phase 2: 10/18 22:25:38 10/18 22:25:38 Phase 3: 10/18 22:25:38 10/18 22:28:36 2 minutes, 58 seconds Phase 4: 10/18 22:28:36 10/18 22:28:36 Phase 5: Skipped Phase 6: Skipped Phase 7: Skipped Total run time: 2 minutes, 58 seconds
  7. I just ran the check on disk 5, here's the entry from syslog Oct 18 21:20:00 palazzo ool www[20601]: /usr/local/emhttp/plugins/dynamix/scripts/xfs_check 'start' '/dev/md5' 'WDC_WD80EMAZ-00WJTA0_1EHSHHLN' '-n' Here's the output from the UI Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... agf_freeblks 257216, counted 257217 in ag 0 agi_freecount 95, counted 98 in ag 0 finobt sb_ifree 684, counted 778 sb_fdblocks 899221751, counted 901244387 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 imap claims a free inode 161821052 is in use, would correct imap and clear inode imap claims a free inode 161821053 is in use, would correct imap and clear inode imap claims a free inode 161821054 is in use, would correct imap and clear inode - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 4 - agno = 1 - agno = 7 - agno = 5 - agno = 3 - agno = 2 - agno = 6 entry "The.redacted.filename" at block 11 offset 3336 in directory inode 4316428006 references free inode 161821052 would clear inode number in entry at offset 3336... No modify flag set, skipping phase 5 Inode allocation btrees are too corrupted, skipping phases 6 and 7 No modify flag set, skipping filesystem flush and exiting. Does this mean I should re-run without the `-n` flag, so that it can try to fix?
  8. I had disk 6 become detached a couple days ago. I forgot to grab diagnostics before that situation, but I was able to get it back by, disabling docker, removing disk 6 from the array, starting the array, stopping the array, and then re-adding disk 6. Then when I restarted the array it rebuilt disk 6 from the parity. I re-enabled docker the next day, and my applications seemed like they were working fine. Then I got home from work, and noticed I had several errors. - Alert [PALAZZO] - Disk 5 in error state (disk dsbl) - Warning [PALAZZO] - Cache pool BTRFS missing device(s) - Warning [PALAZZO] - array has errors Array has 3 disks with read errors After stopping the array this time disk 3, 5, and 6 all showed as missing, so I needed to reboot to get SMART reports on them. The zip included diagnostics.zip, smart reports for 3, 5, and 6, and some screenshots. I rebooted it a couple times, and disk 5 still shows as "Device is disabled, contents emulated". Should I update before trying to rebuild the array this time? What should I do in the future to stop having the disks get detached? Server Logs and Screenshots.zip
  9. It appears from the preclear log that my old 4TB Disk should be fine to use now as well. preclear_disk_PK2334PBKH867T_12513.txt
  10. @trurl thanks for all your help. I'm back up and running, and it has rebuilt disk 4. I guess can I try to run preclear on that old 4TB disk 4 and see if it's working again now too?
  11. Just posting an update. The read phase of the preclear has completed, and I'm about 22% (2:22:25) through the Zeroing phase. It appears that disks 3 and 6 are now showing has healthy now. I don't think I'll have any problem re-building disk 4 at this time. It looks like it'll probably be the middle of the night or tomorrow before the pre-clear will all be completed. I'm curious if after I install this new 8TB as the new disk 4, if there is any way for me to pull it back out, and use it as a 2nd parity disk? i.e. if I used unbalance and moved all data off of disk 4 is there a way to edit the config, and remove it out of the array? Or something like that?
  12. Well it looks like the preclear is going to be taking a while. I'm at 8% into the the pre-read almost an hour in. I did copy the CA Backup archive to another server. It was much bigger than I thought it would be. Anyways I believe I have all the important/irreplaceable items copied to the other server now. (note to self set up an rsync job between servers). The good news is that I seem to have stopped getting warnings about Seek error rate. I've been copying some stuff off the array, and haven't received one of those warnings for about 3 hours now. I did shut down the box again and reseated the cables again. I pulled the drives out of the hotswap bay, and blew it the dust out with compressed air. Maybe one of those things helped. ¯\_(ツ)_/¯ I guess I'll be picking this back up tomorrow. After the preclear what are the next steps. I assume I just stop the array, assign this new drive to disk 4, and start it back up. Is that correct?
  13. Probably a couple hundred GB. I think I have mose of it already. I guess should I be worried about losing appdata for the dockers? The longest thing will probably be waiting for the preclear to run. It seems like that took half a day last time.
  14. Well there was some stuff I had come to grips with loosing. I'm going to try and copy it off to another server. Should I run one of the preclear functions on the new disk in the meantime? If so which preclear option?
  15. Attached fresh diagnostics palazzo-diagnostics-20201219-1734.zip
  16. It appears that it started with an emulated disk. Should I try mounting it as an unassigned device?
  17. I guess I will start it with the red x. I have already done that before, so it shouldn't make anything worse.
  18. Should I try starting it before running any. Smart checks and/or preclear. It's still showing a red x next to the disk. As for the preclear it appears there are different options available: clear verify all the disk verify mbr only erase all the disk erase and clear the disk I guess we can figure out which one to do after verifying I should try starting the array with the red x. palazzo-smart-20201219-1710 - disk 4.zip
  19. I'm attaching the screenshot of the filesystem check, and the latest diagnostics. I'm going to shut down, install the new drive in an open bay, and start pre-clear and extended smart analysis on disks 3, 4, and 6 palazzo-diagnostics-20201219-1645 - plus screenshot.zip
  20. My new 8TB HDD just arrived. Should I run preclear on it? Should I run full smart analysis on disk 3, 4 and/or 6?
  21. Thanks for the check disk filesystem link. It appears the check filesystem on disk 4 just finished. I am attaching a zip with the diagnostics after the check filesystem run. I'm also including disk 6 smart download since it was showing bad on the dashboard now. There is also a few screen shots. - when I started the run - a status hour and a half later - dashboard showing drive 6 says it has issues now - the finish out put of disk 4 Not sure if I need to re-run check without the `-n` option now or not. I just looked at the tracking for my new HDD, and it still doesn't say out for delivery. It appears to have arrived at the local carrier facility this morning though. Thanks again for your help! palazzo-diagnostics-20201219-1021 - plus screenshots.zip
  22. I actually was moving stuff from disk 3 to disk 7 yesterday and early today, when I started getting issues. Most of what's on disk 7 is actually from disk 3. Does that mean my parity is all messed up already? If so I'd be alright with just blowing it away, and starting over, obviously I'd like to keep as much of the data as possible from the healthy drives. I think this time I'd start with double parity drives. I'd almost like to get some larger drives for parity.
  23. Should I stop the array or anything? Shut down the entire server? On the main page it does say that disk 4 has 39 errors, but on the dashboard it shows as healthy? I'm guessing the new 8TB won't be here until ~1 or 2pm PST tomorrow.
  24. I don't currently have any extra 4TB or larger drives on hand. Tomorrow I have a new 8TB showing up. I believe it originally started to do a rebuild on disk 4. I saw the blue square/circle/dot next to it when I started the array once. Not sure if that means it's messed up. I have pulled it out of the array (unassigned it from the array) and started the array that way. When it was in that state I tried to mount it using unassigned devices, and it wouldn't mount there. I think I tried that after a blue square array start. Granted maybe reseating the cables and drive have fixed that.
×
×
  • Create New...