Tybio Posted December 21, 2018 Share Posted December 21, 2018 All, I replaced my old E3 with an E-2176G today on a new SM board, and ran into some issues with an LSI card. The ROM would load and I could see the drives in linux...but unraid would get stuck mounting the only array drive on the LSI board. After some effort I got it to shut down, and moved the LSI card to another slot. Now everything /seems/ to boot properly, but the disk is in "Error Disabled" state. Before I "Start" the array, I wanted to run the diags by the group here. I'd appreicate any input! I've got over 50TB of data on this array and am getting a bit nervious :). tower-diagnostics-20181221-1333.zip Link to comment
Tybio Posted December 21, 2018 Author Share Posted December 21, 2018 The SMART report looks ok. Wonder if I should just try to start the array, it isn't asking to format or anything. Link to comment
JonathanM Posted December 21, 2018 Share Posted December 21, 2018 Haven't looked at your diagnostics, but your description leads me to believe you will need to rebuild the dropped drive, otherwise you will be vulnerable to another disk failure causing data loss. A screenshot of the main GUI page might clear up some things. Link to comment
Tybio Posted December 21, 2018 Author Share Posted December 21, 2018 Here you go! I'm not sure what will happen when I start, it doesn't give the normal "Start Array and rebuild disk" message...just the normal "Start". Link to comment
JonathanM Posted December 21, 2018 Share Posted December 21, 2018 Hmm. The message to format an unmountable drive wouldn't show up until you start the array, so you don't yet know if there will be any issues. Maybe someone else will have a different opinion, but I think if it were me I'd start in maintenance mode and do a file system check on all the drive slots before either rebuilding the drive in place or discarding parity and using the kicked drive as is. If the file system check comes up clean on all 8 data slots, then rebuilding slot 8 onto the same drive is probably the correct option. Link to comment
Tybio Posted December 21, 2018 Author Share Posted December 21, 2018 What's the best way to do an FS check on an xfs filesystem? Link to comment
itimpi Posted December 21, 2018 Share Posted December 21, 2018 Just now, Tybio said: What's the best way to do an FS check on an xfs filesystem? If you start the array in Maintenance mode then you can click on each diskX entry and from the resulting dialog one of the options is to run a file system check. Link to comment
Tybio Posted December 21, 2018 Author Share Posted December 21, 2018 I'm not seeing that option, it doesn't even know the FS on the drive for some reason. Link to comment
Tybio Posted December 21, 2018 Author Share Posted December 21, 2018 In reading the docs, I'm starting to worry. It should know what file system the disks are, and have a file system check section...but it doesn't. Tell me I didn' tlose 50+TB of data please? Link to comment
JorgeB Posted December 22, 2018 Share Posted December 22, 2018 Unassign disk8 and start the array, if the emulated disk mounts and data looks correct rebuild on top. Link to comment
Tybio Posted December 22, 2018 Author Share Posted December 22, 2018 Ok, ran xfs_repair as the docs said from command line and didn't see anything obviously wrong. This is the output for the "Disabled" disk: root@Tower:/boot/config# xfs_repair -nv /dev/md8 Phase 1 - find and verify superblock... - block cache size set to 1460776 entries Phase 2 - using internal log - zero log... zero_log: head block 1389819 tail block 1389819 - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 2 - agno = 5 - agno = 0 - agno = 3 - agno = 4 - agno = 7 - agno = 6 - agno = 8 - agno = 1 - agno = 9 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Fri Dec 21 16:09:54 2018 Phase Start End Duration Phase 1: 12/21 16:09:49 12/21 16:09:49 Phase 2: 12/21 16:09:49 12/21 16:09:49 Phase 3: 12/21 16:09:49 12/21 16:09:52 3 seconds Phase 4: 12/21 16:09:52 12/21 16:09:52 Phase 5: Skipped Phase 6: 12/21 16:09:52 12/21 16:09:54 2 seconds Phase 7: 12/21 16:09:54 12/21 16:09:54 Total run time: 5 seconds root@Tower:/boot/config# Advice on next steps? Link to comment
Tybio Posted December 22, 2018 Author Share Posted December 22, 2018 3 minutes ago, johnnie.black said: Unassign disk8 and start the array, if the emulated disk mounts and data looks correct rebuild on top. Er, if it doesn't know the filesystem to mount with, wouldn't it just format the drive? Link to comment
Tybio Posted December 22, 2018 Author Share Posted December 22, 2018 12 minutes ago, johnnie.black said: Unassign disk8 and start the array, if the emulated disk mounts and data looks correct rebuild on top. Johnnie, Sorry for being annoying here, but what will happen if I hit start and for some reason it can't mount the disks? Will it just fail, or try to initialize them? Link to comment
Tybio Posted December 22, 2018 Author Share Posted December 22, 2018 Ok, loaded up fine and is showing the files on the bad disk. Going to re-add it now and start the rebuild Thanks for the help! Link to comment
trurl Posted December 22, 2018 Share Posted December 22, 2018 40 minutes ago, Tybio said: Tell me I didn' tlose 50+TB of data please? Just for future reference, it's almost impossible to do this without a massive, smoking, hardware failure, or a theft of your server. Unlike RAID systems, each disk in Unraid (not RAID) is independent. Even if several of them did truly die, you could still get any files that are on the good ones. Link to comment
Tybio Posted December 22, 2018 Author Share Posted December 22, 2018 rebuild finished, all seems well! Thanks for the help. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.