Read Errors While Rebuilding to Upgrade a Disk

404fox · June 25, 2022

I bought a 4TB disk to replace the oldest and smallest disk in my array.

I did this by unassigning the old drive, shutting down, physically replacing the disk, assigning the new disk to that slot, and then initiating a rebuild.

While this rebuild was happening, I got read errors on another 4TB disk. The rebuild process is now stuck at 99.9%.

That disk had read errors before, but I had foolishly ignored it since SMART showed no bad values or issues with extended tests and I replaced the corrupted files afterwards.

The way I see it, there are two possibilities:

1. Revert my replacement back to the 1TB disk, and then use the new disk to replace the disk with errors.

2. Accept the loss of some data and recover it from another source. I don't know what it is yet though.

I wouldn't know how to do either of those things though.

Also, is it even necessary for the rebuild to go past 1TB?

(I reset the stats at some point to see if that might do something out of desperation.)

image.png.99df77c579692c6122887220925d1324.png

bigbong-diagnostics-20220625-0642.zip

JorgeB · June 26, 2022

There's no SMART for disk4 since it dropped, cancel the rebuild, check/replace cables for disk4 and post new diags.

404fox · June 26, 2022

bigbong-diagnostics-20220626-0410.zip

Here

Thank you for checking it out. This disk is currently not in a disabled state. Since it now has a value for current pending sector I'm assuming it needs to go.

JorgeB · June 26, 2022

Yes, disk4 appears to be failing, based on the screenshot above looks like there weren't any writes to the array after you upgraded disk2, can you confirm that?

404fox · June 26, 2022

Yes, that is true.

Even before I reset the stats, it was at zero on the disks.

Edited June 26, 2022 by 404fox

JorgeB · June 26, 2022

Then I think your best bet it to use old disk2 and force Unraid to rebuild disk4, parity still won't be 100% in sync, so some fs corruption is possible, you can do that by:

-Tools -> New Config -> Retain current configuration: All -> Apply
-Check all assignments and assign any missing disk(s) if needed, including old disk2 and new disk4, replacement disk4 should be same size or larger than the old one
-IMPORTANT - Check both "parity is already valid" and "maintenance mode" and start the array (note that the GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the checkbox, but it won't be as long as it's checked)
-Stop array
-Unassign disk4
-Start array (in normal mode now), ideally the emulated disk4 will now mount and contents look correct, if it doesn't you should run a filesystem check on the emulated disk
-If the emulated disk mounts and contents look correct stop the array
-Re-assign the disk4 and start array to begin.

For now keep old disk4 intact in case it's needed.

404fox · June 26, 2022

I shut down and restarted to swap the drives back, and I forgot that this would cause the array to start normally once it started. As a result, a number of writes were made although I didn't change any data myself (docker containers?).

I proceeded with the process anyways. All data contents appear good.

When unassigned from the array, the contents of disk 4 do not appear within the pool.

Additionally, old disk 4 is now "Unmountable" although it can still be mounted and read with unassigned devices plugin.

I don't know if it can rebuild disk 4 to a replacement in this state, although I suppose it could have a chance at recovering errored-out data as opposed to simply copying old disk 4 to the replacement.

JorgeB · June 26, 2022

6 minutes ago, 404fox said:

When unassigned from the array, the contents of disk 4 do not appear within the pool.

Do you mean the emulated disk4 is unmountable? If yes:

1 hour ago, JorgeB said:

ideally the emulated disk4 will now mount and contents look correct, if it doesn't you should run a filesystem check on the emulated disk

If it's not that post new diags.

404fox · June 26, 2022

bigbong-diagnostics-20220626-0709.zip

The menu for disk 4 doesn't have any other options.

Edited June 26, 2022 by 404fox

JorgeB · June 26, 2022

No signs of a valid filesystem being found on the emulated disk4, that's very strange assuming parity was valid, try running a manual check to see if a backup superblock is found, but not very optimistic, with the array started in maintenance mode type:

xfs_repair -v /dev/md4

404fox · June 26, 2022

root@BigBong:~# xfs_repair -v /dev/md4
Phase 1 - find and verify superblock...
bad primary superblock - inconsistent filesystem geometry information !!!

attempting to find secondary superblock...
.found candidate secondary superblock...
verified secondary superblock...
writing modified primary superblock
        - block cache size set to 735560 entries
sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129
resetting superblock realtime bitmap inode pointer to 129
sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130
resetting superblock realtime summary inode pointer to 130
Phase 2 - using internal log
        - zero log...
zero_log: head block 3035810 tail block 3035810
        - scan filesystem freespace and inode maps...
sb_icount 0, counted 1504704
sb_ifree 0, counted 5195
sb_fdblocks 976277679, counted 40592957
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 3
        - agno = 2
        - agno = 1
Phase 5 - rebuild AG headers and trees...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
Note - stripe unit (0) and width (0) were copied from a backup superblock.
Please reset with mount -o sunit=<value>,swidth=<value> if necessary

        XFS_REPAIR Summary    Sun Jun 26 07:52:29 2022

Phase           Start           End             Duration
Phase 1:        06/26 07:48:35  06/26 07:48:36  1 second
Phase 2:        06/26 07:48:36  06/26 07:48:38  2 seconds
Phase 3:        06/26 07:48:38  06/26 07:50:43  2 minutes, 5 seconds
Phase 4:        06/26 07:50:43  06/26 07:50:44  1 second
Phase 5:        06/26 07:50:44  06/26 07:50:46  2 seconds
Phase 6:        06/26 07:50:46  06/26 07:52:28  1 minute, 42 seconds
Phase 7:        06/26 07:52:28  06/26 07:52:28

Total run time: 3 minutes, 53 seconds
done

Here is the result. It worked, and the emulated contents can be read without issue.

Should I go for rebuilding to the replacement disk?

JorgeB · June 26, 2022

Yes, still keep old disk intact for now.

404fox · June 27, 2022

The rebuild finished successfully, now running a parity check. Thanks.

Do you think I could do anything with the old disk 4?

JorgeB · June 27, 2022

If the contents of the rebuilt disk4 look correct you don't need to do anything else, in doubt you can always mount the old disk with UD and compared data.

nico44 · December 29, 2023

I have this same problem. Got everything working but its showing disk 5 unmountable. After trying to run xfs_repair -v /dev/md5 I get fatal error.

trurl · December 29, 2023

Attach Diagnostics to your NEXT post in this thread.

nico44 · December 29, 2023

diagnostics-20231228-2207.zip

itimpi · December 29, 2023

2 hours ago, nico44 said:

I have this same problem. Got everything working but its showing disk 5 unmountable. After trying to run xfs_repair -v /dev/md5 I get fatal error.

That is the wrong device name! If you run check filesystem via the GUI the correct device name will automatically be used.

nico44 · December 29, 2023

How do you do that ?

Kev600 · December 29, 2023

30 minutes ago, nico44 said:

How do you do that ?

Itimpi gave you a direct link to the help file..

nico44 · December 29, 2023

It’s not showing me the format option.

itimpi · December 29, 2023

4 minutes ago, nico44 said:

It’s not showing me the format option.

Why should it as format is not a way to recover data? Or have I misunderstood your statement?

nico44 · December 29, 2023

Disk 5 is my new disk, I still have all my data, When I tried to upgrade disk 1 , disk 5 died. I reinstalled disk 1 and and have all my data. When I installed the new disk 5 its unmountable.

trurl · December 29, 2023

You have to repair the filesystem on emulated disk5 before rebuild, or on the newly rebuilt disk5 after rebuild completes.

Whatever you do DO NOT FORMAT.

nico44 · December 29, 2023

what happens if it was format? I didn't lose data, not from what I see. I guess that there is no back-up drive ? Can this be fixed ?

Read Errors While Rebuilding to Upgrade a Disk

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation