[Solved] Red X --> Parity Built --> Read Errors & Corrupted Data

Gico · June 25, 2023

Sequence of events:

1. I had a working array with one parity drive and a spare disk already in the server.

2. Stopped the array, assigned the spare disk to 2nd parity and started the array.

3. disk 5 turned invalid (Red X) immediately.

4. I stopped the array, unassigned disk 5, started the array, stopped it, reassigned disk 5 and started it again, so 2nd parity and disk 5 being rebuilt.

5. Overnight disk 11 got read errors, but still in the array (no red-X). Parity built seems to continues, but in main I can see no writes to parity 2 and to disk 5 and also no reads from disk 11. I can access these 2 disks fs, but the files data is corrupted.

I had issues with Red-Xs in multiple disks in the past that were caused by insufficient cooling to the server, and in particular to the on board HBA. I'm guessing that the data on these disks is ok, but this needed to be checked when they are unassigned.

Ideas how to continue now, please.

Update:

- disk 11 actually dropped from the array because it appears in the Unassigned Devices list.

- I can't stop the Parity operation through the web interface: It just continued when I try. How can I stop it through CLI?

juno-diagnostics-20230625-0704.zip

Edited June 25, 2023 by Gico

JorgeB · June 25, 2023

Type reboot in the console, if it doesn't after 5 minutes you will need to force it, then replace disk11 cables/swap slots and try again.

Gico · June 25, 2023

I rebooted, started in maintenance mode and executed xfs_repair -n for disks 5 and 11. Logs attached.

They seemed ok to me so I did new config, preserved all current assignments and started the array.

Disk 11 is mounted, Disk 5 is not (syslog following), so now I think only option is maintenance mode and xfs_repair with correction, right?

The disk fs type is set to "auto" so I can't execute the command from the gui.

Any ideas / What is the command I need to run?

Jun 25 13:59:01 Juno emhttpd: mounting /mnt/disk5
Jun 25 13:59:01 Juno emhttpd: shcmd (279): mkdir -p /mnt/disk5
Jun 25 13:59:01 Juno emhttpd: shcmd (280): mount -t xfs -o noatime,nouuid /dev/md5p1 /mnt/disk5
Jun 25 13:59:01 Juno kernel: XFS (md5p1): Mounting V5 Filesystem
Jun 25 13:59:02 Juno kernel: XFS (md5p1): Corruption warning: Metadata has LSN (4:1317887) ahead of current LSN (4:1316525). Please unmount and run xfs_repair (>= v4.3) to resolve.
Jun 25 13:59:02 Juno kernel: XFS (md5p1): log mount/recovery failed: error -22
Jun 25 13:59:02 Juno kernel: XFS (md5p1): log mount failed
Jun 25 13:59:02 Juno root: mount: /mnt/disk5: wrong fs type, bad option, bad superblock on /dev/md5p1, missing codepage or helper program, or other error.
Jun 25 13:59:02 Juno root:        dmesg(1) may have more information after failed mount system call.
Jun 25 13:59:02 Juno emhttpd: shcmd (280): exit status: 32
Jun 25 13:59:02 Juno emhttpd: /mnt/disk5: no btrfs or device /dev/md5p1 is not single
Jun 25 13:59:02 Juno emhttpd: /usr/sbin/zpool import -d /dev/md5p1 2>&1
Jun 25 13:59:02 Juno emhttpd: no pools available to import
Jun 25 13:59:02 Juno emhttpd: disk5: no uuid
Jun 25 13:59:02 Juno emhttpd: shcmd (281): mount -t reiserfs -o noatime,user_xattr,acl /dev/md5p1 /mnt/disk5
Jun 25 13:59:02 Juno kernel: REISERFS warning (device md5p1): sh-2021 reiserfs_fill_super: can not find reiserfs on md5p1
Jun 25 13:59:02 Juno root: mount: /mnt/disk5: wrong fs type, bad option, bad superblock on /dev/md5p1, missing codepage or helper program, or other error.
Jun 25 13:59:02 Juno root:        dmesg(1) may have more information after failed mount system call.
Jun 25 13:59:02 Juno emhttpd: shcmd (281): exit status: 32
Jun 25 13:59:02 Juno emhttpd: /mnt/disk5 mount error: Unsupported or no file system
Jun 25 13:59:02 Juno emhttpd: shcmd (282): rmdir /mnt/disk5

xfs_repair.Disk11.txt xfs_repair.Disk5.txt

itimpi · June 25, 2023

If you are sure the disk is XFS format, then with the array stopped you can click on it on the Main tab and explicitly set the format to xfs. When you now start in Maintenance mode you will offered the option to run xfs_repair via the GUI.

Gico · June 25, 2023

Yes I'm sure 🙂

Anyway this saga ended for now:

Stopped the array, changed the fs type to xfs and started in maintenance mode.

xfs_repair -n still seemed ok.

xfs_repair generated this error:

ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed. Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair. If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

So I stopped the array and tried again (Unsuccessfully) to mount.

Next was maintenance mode again and xfs_repair -L which WAS SUCCSSFULL.

All drives are mounted, content of disk5 seems ok, and Parity-Sync is running.

Thanks a lot @JorgeB and @itimpi!

Gico · June 27, 2023

Parity-Sync is going quite slowly at about 55MB/sec, sometimes even slower. disk 5 got again some read errors. still has a green dot,

but it's read speed is about twice the speed of other disks so I guess it has issues and it delays the Parity-Sync. Occasionally disk 11 speeds also has faster read speed than the rest of the disks. Both disks are of the same model (Seagate 16TB HDD Exos X16) and bought together on Amazon. They are probably out of warranty.

My questions:

- Can I pause the Parity-Sync, shut down the server to check/replace cables and continue the Parity-Sync after boot?

- Can I somehow check the disks before starting back the array? maybe mount in read-only and do data-read-test somehow?

Attached smart from both disks

juno-smart-20230627-1006.zip juno-smart-20230627-0927.zip

JorgeB · June 27, 2023

Please post new diags.

Gico · June 27, 2023

Attached.

juno-diagnostics-20230627-1201.zip

JorgeB · June 27, 2023

Parity cannot be correctly built with read errors on data disks, I would cancel the sync, replace cables for disk5 and try again.

Gico · June 27, 2023

But if the disk still has a green dot, doesn't it mean that the fs overcame this issue and the data was read successfully?

Edited June 27, 2023 by Gico

JorgeB · June 27, 2023

12 minutes ago, Gico said:

But if the disk still has a green dot

Green dot only means that the disk is not disabled, for parity to be successfully build there can't be any read errors from any data disks, if you continue parity won't be valid for those affected sectors.

JorgeB · June 27, 2023

If you prefer you can let it finish, and then run a correcting parity check once the issue is solved, but that will take longer.

Gico · June 27, 2023

I cancelled, but why unraid won't stop the parity sync if it's unreliable?

Similarly when at first I added the 2nd parity drive, and one disk was disabled, parity sync continued without writing further to the new parity disk,

so again..imo unraid should have stopped the parity-sync.

Will update when I'll get home.

Edit: How do the smart reports of disk5 and disk 11 look like? Are they ok apart from the read issues which might be caused by cables?

Edited June 27, 2023 by Gico

JorgeB · June 27, 2023

9 minutes ago, Gico said:

I cancelled, but why unraid won't stop the parity sync if it's unreliable?

Because if it was a disk problem it would still create parity, it wouldn't be 100% in sync but almost, better than nothing.

JorgeB · June 27, 2023

SMART looks fine

Gico · June 27, 2023

SAS cable replaced although no external damage can be seen.

Started the array and parity-sync. All disks have same read speed, similar to previous parity-syncs.

Gico · July 2, 2023

Parity-Sync completed successfully. Thanks for the help.

[Solved] Red X --> Parity Built --> Read Errors & Corrupted Data

Recommended Posts

Gico

Link to comment

JorgeB

Link to comment

Gico

Link to comment

itimpi

Link to comment

Gico

Link to comment

Gico

Link to comment

JorgeB

Link to comment

Gico

Link to comment

JorgeB

Link to comment

Gico

Link to comment

JorgeB

Link to comment

JorgeB

Link to comment

Gico

Link to comment

JorgeB

Link to comment

JorgeB

Link to comment

Gico

Link to comment

Gico

Link to comment

Join the conversation