December 21, 20187 yr All, I replaced my old E3 with an E-2176G today on a new SM board, and ran into some issues with an LSI card. The ROM would load and I could see the drives in linux...but unraid would get stuck mounting the only array drive on the LSI board. After some effort I got it to shut down, and moved the LSI card to another slot. Now everything /seems/ to boot properly, but the disk is in "Error Disabled" state. Before I "Start" the array, I wanted to run the diags by the group here. I'd appreicate any input! I've got over 50TB of data on this array and am getting a bit nervious :). tower-diagnostics-20181221-1333.zip Edited December 22, 20187 yr by Tybio
December 21, 20187 yr Author The SMART report looks ok. Wonder if I should just try to start the array, it isn't asking to format or anything.
December 21, 20187 yr Haven't looked at your diagnostics, but your description leads me to believe you will need to rebuild the dropped drive, otherwise you will be vulnerable to another disk failure causing data loss. A screenshot of the main GUI page might clear up some things.
December 21, 20187 yr Author Here you go! I'm not sure what will happen when I start, it doesn't give the normal "Start Array and rebuild disk" message...just the normal "Start".
December 21, 20187 yr Hmm. The message to format an unmountable drive wouldn't show up until you start the array, so you don't yet know if there will be any issues. Maybe someone else will have a different opinion, but I think if it were me I'd start in maintenance mode and do a file system check on all the drive slots before either rebuilding the drive in place or discarding parity and using the kicked drive as is. If the file system check comes up clean on all 8 data slots, then rebuilding slot 8 onto the same drive is probably the correct option.
December 21, 20187 yr Community Expert Just now, Tybio said: What's the best way to do an FS check on an xfs filesystem? If you start the array in Maintenance mode then you can click on each diskX entry and from the resulting dialog one of the options is to run a file system check.
December 21, 20187 yr Author I'm not seeing that option, it doesn't even know the FS on the drive for some reason.
December 21, 20187 yr Author In reading the docs, I'm starting to worry. It should know what file system the disks are, and have a file system check section...but it doesn't. Tell me I didn' tlose 50+TB of data please? Edited December 22, 20187 yr by Tybio
December 22, 20187 yr Community Expert Unassign disk8 and start the array, if the emulated disk mounts and data looks correct rebuild on top.
December 22, 20187 yr Author Ok, ran xfs_repair as the docs said from command line and didn't see anything obviously wrong. This is the output for the "Disabled" disk: root@Tower:/boot/config# xfs_repair -nv /dev/md8 Phase 1 - find and verify superblock... - block cache size set to 1460776 entries Phase 2 - using internal log - zero log... zero_log: head block 1389819 tail block 1389819 - scan filesystem freespace and inode maps... - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 2 - agno = 5 - agno = 0 - agno = 3 - agno = 4 - agno = 7 - agno = 6 - agno = 8 - agno = 1 - agno = 9 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - agno = 8 - agno = 9 - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. XFS_REPAIR Summary Fri Dec 21 16:09:54 2018 Phase Start End Duration Phase 1: 12/21 16:09:49 12/21 16:09:49 Phase 2: 12/21 16:09:49 12/21 16:09:49 Phase 3: 12/21 16:09:49 12/21 16:09:52 3 seconds Phase 4: 12/21 16:09:52 12/21 16:09:52 Phase 5: Skipped Phase 6: 12/21 16:09:52 12/21 16:09:54 2 seconds Phase 7: 12/21 16:09:54 12/21 16:09:54 Total run time: 5 seconds root@Tower:/boot/config# Advice on next steps?
December 22, 20187 yr Author 3 minutes ago, johnnie.black said: Unassign disk8 and start the array, if the emulated disk mounts and data looks correct rebuild on top. Er, if it doesn't know the filesystem to mount with, wouldn't it just format the drive? Edited December 22, 20187 yr by Tybio
December 22, 20187 yr Author 12 minutes ago, johnnie.black said: Unassign disk8 and start the array, if the emulated disk mounts and data looks correct rebuild on top. Johnnie, Sorry for being annoying here, but what will happen if I hit start and for some reason it can't mount the disks? Will it just fail, or try to initialize them?
December 22, 20187 yr Author Ok, loaded up fine and is showing the files on the bad disk. Going to re-add it now and start the rebuild Thanks for the help!
December 22, 20187 yr Community Expert 40 minutes ago, Tybio said: Tell me I didn' tlose 50+TB of data please? Just for future reference, it's almost impossible to do this without a massive, smoking, hardware failure, or a theft of your server. Unlike RAID systems, each disk in Unraid (not RAID) is independent. Even if several of them did truly die, you could still get any files that are on the good ones.
Archived
This topic is now archived and is closed to further replies.