disk1 disable (after huge read errors), but content not actually emulated by parity drive


Go to solution Solved by wilsonhomelab,

Recommended Posts

A month ago, I transferred my array (Ironwolf 8tb x2, 8 months old) from desktop hardwares to server hardwares (dual Xeon E5-2680 v4, 64GB Samsung ecc memory, Supermicro X10DRL-i server mobo ). Unraid recognised all drives (parity, disk 1) in the array and ran smoothly for couple of weeks. 10 day ago, I purchased an additional Ironwolf 8tb (disk2) and added to the array after preclear. I ran parity check without any problem with 220MB/s average speed. 

 

Last week, the disk 1 started showing errors after I transferred some media files from unassigned drives using Krusader (docker container). So I tried swapping SATA cable / SATA ports, but the disk errors only seemed worse (from 4 errors to 80 errors). Then I tried parity check which only gave me KB/s speed (I paused it). I knew something was wrong. I quickly checked the diagnostic and showing "ata6: hard resetting link", "ata6: link is slow to respond, please be patient (ready=0)" unraid-xeon-diagnostics-20220312-1735.zip

 

Last night, I saw the accumulated disk 1 errors as high as 800 (after a week), so in order to rule out the potential problems from the mobo SATA ports and the SATA calbles I used, I installed a working reliable LSI SAS2008 with a working reliable SFF8087-to-4-SATA cable. After booting up Unraid, all disks were recognised, but disk1 and disk 2 were both "unmountable" with option to format. unraid-xeon-diagnostics-20220321-0240.zip

 

I then upgraded the mobo BIOS to the latest and ran 6 hrs of Memtest86+ without error. 365995768_iKVM_Memtest86.jpg.7aa16a8c21428f82e5249f308668947a.jpg

 

This morning, I restarted Unraid and saw disk 1 "unmountable"; parity and disk 2 are working. Shares stored in disk 1 is not showing up (not emulated at all). I just wonder why emulation is not working with only one disk down? unraid-xeon-diagnostics-20220321-0909.zip

 

770576691_ScreenShot2022-03-21at10_39_43am.thumb.png.69539af86e34efcf6c027ef9ef737eb1.png

 

Could you please also advise what should I do now? repair the xfs file system? Please help~

 

 

 

Edited by wilsonhomelab
Link to comment

I click the "CHECK" button under "disk1 setting", and nothing happen. So I typed the command as follow

 

root@UNRAID-Xeon:~# xfs_repair /dev/md1
Phase 1 - find and verify superblock...
        - reporting progress in intervals of 15 minutes
Phase 2 - using internal log
        - zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

Could you please explain what the "ERROR" session ?  Thanks

Link to comment

I went ahead with the -L option. Now I can see the disk 1 content!😀

 

If I understand correctly, will the parity start rebuilding the physical disk 1 once I restart the array with disk 1 plugged in? 

 

I also want to make sure whether this issue is an indication of a failing drive. May I perform a pre-clear of disk 1 first to see if it can withstand the heavy read / write process? If it is good, I put it back to the array for rebuild. 

 

root@UNRAID-Xeon:~# xfs_repair /dev/md1 -L
Phase 1 - find and verify superblock...
        - reporting progress in intervals of 15 minutes
Phase 2 - using internal log
        - zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
        - 04:55:12: zeroing log - 119233 of 119233 blocks done
        - scan filesystem freespace and inode maps...
        - 04:55:13: scanning filesystem freespace - 32 of 32 allocation groups done
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - 04:55:13: scanning agi unlinked lists - 32 of 32 allocation groups done
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 15
        - agno = 30
        - agno = 16
        - agno = 17
        - agno = 1
        - agno = 31
        - agno = 18
        - agno = 2
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 3
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 4
        - agno = 5
        - agno = 29
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - 04:55:24: process known inodes and inode discovery - 148992 of 148992 inodes done
        - process newly discovered inodes...
        - 04:55:24: process newly discovered inodes - 32 of 32 allocation groups done
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - 04:55:24: setting up duplicate extent list - 32 of 32 allocation groups done
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 3
        - agno = 2
        - agno = 7
        - agno = 9
        - agno = 11
        - agno = 12
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 13
        - agno = 28
        - agno = 5
        - agno = 1
        - agno = 8
        - agno = 19
        - agno = 17
        - agno = 6
        - agno = 18
        - agno = 20
        - agno = 4
        - agno = 15
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 10
        - agno = 27
        - agno = 16
        - agno = 29
        - agno = 30
        - agno = 14
        - agno = 31
        - 04:55:24: check for inodes claiming duplicate blocks - 148992 of 148992 inodes done
Phase 5 - rebuild AG headers and trees...
        - 04:55:25: rebuild AG headers and trees - 32 of 32 allocation groups done
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
        - 04:55:31: verify and correct link counts - 32 of 32 allocation groups done
Maximum metadata LSN (6:909635) is ahead of log (1:2).
Format log to cycle 9.
done

 

1119354686_ScreenShot2022-03-22at5_04_05am.thumb.png.84a8a85a50a91a726738816161e8667b.png

Link to comment

Putting the disk back will cause it to be rebuilt to match the emulated drive.

 

39 minutes ago, wilsonhomelab said:

May I perform a pre-clear of disk 1 first to see if it can withstand the heavy read / write process? If it is good, I put it back to the array for rebuild. 

That is fine.   You could also try running an extended SMART test on the drive before adding it back.

Link to comment
  • Solution

I think I finnally found the culprit. The I/O errors actually came from, I believe,  the insufficient sata power. I used a molex-to-2x-15-pin-sata cable to power the silverstone 5HDD hot-swag cage due to the clerance problem. And I never ran into problems untill I populated all 5 HDD into a single cage (I have two of the cages). I found a simular issue, which prompted me to try power the cage with two dedicated sata cable straigh from the PSU. 

415380546_ScreenShot2022-03-25at12_00_03pm.png.d94d0e25c60219831c9259e0571234a8.png

 

THE end result is that the disk 1 was pre-clear at double of data rate at 200+MB/s instead of 90MB/s . The rebuild process was done in 10 hrs with average 202MB/s. I wish the system log could be more precise about the I/O error (" hard resetting link") , or the drive itself should have reported under-power issue.  I hope this will be helpful for someelse.

602203752_ScreenShot2022-03-25at10_16_52am.thumb.png.c67fb537e994ff05aecc56e92fd7181b.png

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.