[Solved] Hardware upgrade issues (Disabled Disk)

December 21, 20187 yr

All,

I replaced my old E3 with an E-2176G today on a new SM board, and ran into some issues with an LSI card. The ROM would load and I could see the drives in linux...but unraid would get stuck mounting the only array drive on the LSI board. After some effort I got it to shut down, and moved the LSI card to another slot. Now everything /seems/ to boot properly, but the disk is in "Error Disabled" state. Before I "Start" the array, I wanted to run the diags by the group here.

I'd appreicate any input! I've got over 50TB of data on this array and am getting a bit nervious :).

tower-diagnostics-20181221-1333.zip

Edited December 22, 20187 yr by Tybio

Quote

December 21, 20187 yr

Author

The SMART report looks ok. Wonder if I should just try to start the array, it isn't asking to format or anything.

Quote

December 21, 20187 yr

Haven't looked at your diagnostics, but your description leads me to believe you will need to rebuild the dropped drive, otherwise you will be vulnerable to another disk failure causing data loss.

A screenshot of the main GUI page might clear up some things.

Quote

December 21, 20187 yr

Author

Here you go! I'm not sure what will happen when I start, it doesn't give the normal "Start Array and rebuild disk" message...just the normal "Start".

Quote

December 21, 20187 yr

Hmm. The message to format an unmountable drive wouldn't show up until you start the array, so you don't yet know if there will be any issues.

Maybe someone else will have a different opinion, but I think if it were me I'd start in maintenance mode and do a file system check on all the drive slots before either rebuilding the drive in place or discarding parity and using the kicked drive as is.

If the file system check comes up clean on all 8 data slots, then rebuilding slot 8 onto the same drive is probably the correct option.

Quote

December 21, 20187 yr

Author

What's the best way to do an FS check on an xfs filesystem?

Quote

December 21, 20187 yr

Community Expert

Just now, Tybio said:

What's the best way to do an FS check on an xfs filesystem?

If you start the array in Maintenance mode then you can click on each diskX entry and from the resulting dialog one of the options is to run a file system check.

Quote

December 21, 20187 yr

Author

I'm not seeing that option, it doesn't even know the FS on the drive for some reason.

Quote

December 21, 20187 yr

Author

In reading the docs, I'm starting to worry. It should know what file system the disks are, and have a file system check section...but it doesn't. Tell me I didn' tlose 50+TB of data please?

Edited December 22, 20187 yr by Tybio

Quote

December 22, 20187 yr

Community Expert

Unassign disk8 and start the array, if the emulated disk mounts and data looks correct rebuild on top.

Quote

December 22, 20187 yr

Author

Ok, ran xfs_repair as the docs said from command line and didn't see anything obviously wrong. This is the output for the "Disabled" disk:

root@Tower:/boot/config# xfs_repair -nv /dev/md8
Phase 1 - find and verify superblock...
        - block cache size set to 1460776 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 1389819 tail block 1389819
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 2
        - agno = 5
        - agno = 0
        - agno = 3
        - agno = 4
        - agno = 7
        - agno = 6
        - agno = 8
        - agno = 1
        - agno = 9
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Fri Dec 21 16:09:54 2018

Phase		Start		End		Duration
Phase 1:	12/21 16:09:49	12/21 16:09:49
Phase 2:	12/21 16:09:49	12/21 16:09:49
Phase 3:	12/21 16:09:49	12/21 16:09:52	3 seconds
Phase 4:	12/21 16:09:52	12/21 16:09:52
Phase 5:	Skipped
Phase 6:	12/21 16:09:52	12/21 16:09:54	2 seconds
Phase 7:	12/21 16:09:54	12/21 16:09:54

Total run time: 5 seconds
root@Tower:/boot/config#

Advice on next steps?

Quote

December 22, 20187 yr

Author

3 minutes ago, johnnie.black said:

Unassign disk8 and start the array, if the emulated disk mounts and data looks correct rebuild on top.

Er, if it doesn't know the filesystem to mount with, wouldn't it just format the drive?

Edited December 22, 20187 yr by Tybio

Quote

December 22, 20187 yr

Author

12 minutes ago, johnnie.black said:

Unassign disk8 and start the array, if the emulated disk mounts and data looks correct rebuild on top.

Johnnie,

Sorry for being annoying here, but what will happen if I hit start and for some reason it can't mount the disks? Will it just fail, or try to initialize them?

Quote

December 22, 20187 yr

Community Expert

Just fail.

Quote

December 22, 20187 yr

Author

Ok, loaded up fine and is showing the files on the bad disk. Going to re-add it now and start the rebuild

Thanks for the help!

Quote

December 22, 20187 yr

Community Expert

40 minutes ago, Tybio said:

Tell me I didn' tlose 50+TB of data please?

Just for future reference, it's almost impossible to do this without a massive, smoking, hardware failure, or a theft of your server.

Unlike RAID systems, each disk in Unraid (not RAID) is independent. Even if several of them did truly die, you could still get any files that are on the good ones.

Quote

December 22, 20187 yr

Author

rebuild finished, all seems well! Thanks for the help.

Quote

[Solved] Hardware upgrade issues (Disabled Disk)

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)