Jump to content

[Solved] Red X --> Parity Built --> Read Errors & Corrupted Data


Go to solution Solved by JorgeB,

Recommended Posts

Sequence of events:

1. I had a working array with one parity drive and a spare disk already in the server.

2. Stopped the array, assigned the spare disk to 2nd parity and started the array.

3. disk 5 turned invalid (Red X) immediately.

4. I stopped the array, unassigned disk 5, started the array, stopped it, reassigned disk 5 and started it again, so 2nd parity and disk 5 being rebuilt.

5. Overnight disk 11 got read errors, but still in the array (no red-X). Parity built seems to continues, but in main I can see no writes to parity 2 and to disk 5 and also no reads from disk 11. I can access these 2 disks fs, but the files data is corrupted.

 

I had issues with Red-Xs in multiple disks in the past that were caused by insufficient cooling to the server, and in particular to the on board HBA. I'm guessing that the data on these disks is ok, but this needed to be checked when they are unassigned.

 

Ideas how to continue now, please.

 

Update:

- disk 11 actually dropped from the array because it appears in the Unassigned Devices list.

- I can't stop the Parity operation through the web interface: It just continued when I try. How can I stop it through CLI?

juno-diagnostics-20230625-0704.zip

Edited by Gico
Link to comment

I rebooted, started in maintenance mode and executed xfs_repair -n for disks 5 and 11. Logs attached.

They seemed ok to me so I did new config, preserved all current assignments and started the array.

Disk 11 is mounted, Disk 5 is not (syslog following), so now I think only option is maintenance mode and xfs_repair with correction, right?

The disk fs type is set to "auto" so I can't execute the command from the gui.

Any ideas / What is the command I need to run?

 

Jun 25 13:59:01 Juno emhttpd: mounting /mnt/disk5
Jun 25 13:59:01 Juno emhttpd: shcmd (279): mkdir -p /mnt/disk5
Jun 25 13:59:01 Juno emhttpd: shcmd (280): mount -t xfs -o noatime,nouuid /dev/md5p1 /mnt/disk5
Jun 25 13:59:01 Juno kernel: XFS (md5p1): Mounting V5 Filesystem
Jun 25 13:59:02 Juno kernel: XFS (md5p1): Corruption warning: Metadata has LSN (4:1317887) ahead of current LSN (4:1316525). Please unmount and run xfs_repair (>= v4.3) to resolve.
Jun 25 13:59:02 Juno kernel: XFS (md5p1): log mount/recovery failed: error -22
Jun 25 13:59:02 Juno kernel: XFS (md5p1): log mount failed
Jun 25 13:59:02 Juno root: mount: /mnt/disk5: wrong fs type, bad option, bad superblock on /dev/md5p1, missing codepage or helper program, or other error.
Jun 25 13:59:02 Juno root:        dmesg(1) may have more information after failed mount system call.
Jun 25 13:59:02 Juno emhttpd: shcmd (280): exit status: 32
Jun 25 13:59:02 Juno emhttpd: /mnt/disk5: no btrfs or device /dev/md5p1 is not single
Jun 25 13:59:02 Juno emhttpd: /usr/sbin/zpool import -d /dev/md5p1 2>&1
Jun 25 13:59:02 Juno emhttpd: no pools available to import
Jun 25 13:59:02 Juno emhttpd: disk5: no uuid
Jun 25 13:59:02 Juno emhttpd: shcmd (281): mount -t reiserfs -o noatime,user_xattr,acl /dev/md5p1 /mnt/disk5
Jun 25 13:59:02 Juno kernel: REISERFS warning (device md5p1): sh-2021 reiserfs_fill_super: can not find reiserfs on md5p1
Jun 25 13:59:02 Juno root: mount: /mnt/disk5: wrong fs type, bad option, bad superblock on /dev/md5p1, missing codepage or helper program, or other error.
Jun 25 13:59:02 Juno root:        dmesg(1) may have more information after failed mount system call.
Jun 25 13:59:02 Juno emhttpd: shcmd (281): exit status: 32
Jun 25 13:59:02 Juno emhttpd: /mnt/disk5 mount error: Unsupported or no file system
Jun 25 13:59:02 Juno emhttpd: shcmd (282): rmdir /mnt/disk5

 

xfs_repair.Disk11.txt xfs_repair.Disk5.txt

Link to comment

Yes I'm sure 🙂

Anyway this saga ended for now:

Stopped the array, changed the fs type to xfs and started in maintenance mode.

 

xfs_repair -n still seemed ok.

 

xfs_repair generated this error:

ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

 

So I stopped the array and tried again (Unsuccessfully) to mount.

Next was maintenance mode again and xfs_repair -L which WAS SUCCSSFULL.

All drives are mounted, content of disk5 seems ok, and Parity-Sync is running.

 

Thanks a lot @JorgeB and @itimpi!

  • Like 1
Link to comment
  • Gico changed the title to [Solved] Red X --> Parity Built --> Read Errors & Corrupted Data

Parity-Sync is going quite slowly at about 55MB/sec, sometimes even slower. disk 5 got again some read errors. still has a green dot,

but it's read speed is about twice the speed of other disks so I guess it has issues and it delays the Parity-Sync. Occasionally disk 11 speeds also has faster read speed than the rest of the disks. Both disks are of the same model (Seagate 16TB HDD Exos X16) and bought together on Amazon. They are probably out of warranty.

My questions:

- Can I pause the Parity-Sync, shut down the server to check/replace cables and continue the Parity-Sync after boot?

- Can I somehow check the disks before starting back the array? maybe mount in read-only and do data-read-test somehow?

Attached smart from both disks

juno-smart-20230627-1006.zip juno-smart-20230627-0927.zip

Link to comment

I cancelled, but why unraid won't stop the parity sync if it's unreliable?

Similarly when at first I added the 2nd parity drive, and one disk was disabled, parity sync continued without writing further to the new parity disk,

so again..imo unraid should have stopped the parity-sync.

 

Will update when I'll get home.

 

Edit: How do the smart reports of disk5 and disk 11 look like? Are they ok apart from the read issues which might be caused by cables?

Edited by Gico
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...