Jump to content

UDMA CRC Error and disabled drive


Recommended Posts

Version 6.10.3 | Intel Core i3-2105 | LSI SAS9201-8i | WD EDFZ drives (12TB-14TB)

I had a power outage that caused an unclean shutdown and afterward one of my drives was having UDMA CRC errors and the media stored on it was inaccessible. Due to laziness and having media stored on other drives to utilize, I allowed this drive to exist in the array without replacing the cables for a couple months. Today I replaced the cables (using a 4 SATA breakout cable from the LSI device) assuming this would correct the error and allow me to access data stored on the drive. Rather than being fixed, now the machine boots up with Disk 1 disabled. When I look at SMART health it looks fine outside of these errors, so I am unsure if replacing the drive is required at this time. However, I want to minimize my risk of losing data and only have 1 parity drive and no experience with recovering data/drives so I want to make sure I don't do anything that will lose data (like improperly rebuilding the drive and overwriting with corrupted data or something like that). I have attached (what I believe is) the extended SMART self-test and the tower diagnostics tool data but I have no clue what to look for here.

Any tips would be great. It was a real bitch to get my LSI card firmware right iirc (I was having issues with booting and getting files from the drive if memory serves me well and I spent days troubleshooting this) but I am open to replacing that once I have done what I can from a software side of things. If the drive is toast, am I able to migrate the data to a new disk? My array has 6 disks and so far all the data I have stored on it has been going to Disk 1 and Disk 2 (the others have 1% utilization). I am not really keen on buying a fresh drive when I have so much empty space but I also understand that the proper way to rebuild lost data is replacing the drive and rebuilding it from parity and idk how the data is stored to drives with unraid, so presumably Disk 3-5 have data on them that is required to have fault protection for the array (idk if unraid is using like a striping model or whatever, I am a noob).

XFS Filesystem Check Results:
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

tower-smart-20220625-1617.zip

tower-diagnostics-20220626-1021.zip

Edited by musicjunkie07
Link to comment

The diagnostics show no obvious errors, but since the array has not been started we cannot tell if ‘disk1’ is being correctly emulated.  Since it takes more than CRC errors to make the data on a drive inaccessible I expect there is more we need to know before suggesting the best course of action.  I would suggest posting new diagnostics with the array started.

 

Do you have a spare drive you can rebuild to?   It is always best if possible to keep the contents of a disabled drive intact if possible until a rebuild has successfully completed as that gives a fallback option for data recovery if the rebuild fails in any way.

Link to comment
9 hours ago, itimpi said:

The diagnostics show no obvious errors, but since the array has not been started we cannot tell if ‘disk1’ is being correctly emulated.  Since it takes more than CRC errors to make the data on a drive inaccessible I expect there is more we need to know before suggesting the best course of action.  I would suggest posting new diagnostics with the array started.

 

Do you have a spare drive you can rebuild to?   It is always best if possible to keep the contents of a disabled drive intact if possible until a rebuild has successfully completed as that gives a fallback option for data recovery if the rebuild fails in any way.

I went ahead and updated the diagnostics in the original post with one taken while array is started. Unfortunately I do not have a spare drive handy to rebuild to. If it comes to that, I would probably prefer risking the data over buying a new drive. None of the data is irreplaceable so if rebuilding onto the same drive to get Disk 1 back online is an option I am fine taking that risk rather than buying a new drive

Link to comment

Looking at those diagnostics it looks as if the ‘emulated’ disk1 is mounting fine.    You might want to browse the emulated disk to see if the data looks like what you expect as the rebuild process simply makes the physical disk match the emulated one.   The process for rebuilding a disk onto itself is described here in the online documentations accessible via the ‘Manual’ link at the bottom of the GUI.

 

if the emulated disk does NOT look like you expect then using the data from the physical disk instead can sometimes be a better option, but I suspect not in your case.

 

Link to comment
35 minutes ago, itimpi said:

Looking at those diagnostics it looks as if the ‘emulated’ disk1 is mounting fine.    You might want to browse the emulated disk to see if the data looks like what you expect as the rebuild process simply makes the physical disk match the emulated one.   The process for rebuilding a disk onto itself is described here in the online documentations accessible via the ‘Manual’ link at the bottom of the GUI.

 

if the emulated disk does NOT look like you expect then using the data from the physical disk instead can sometimes be a better option, but I suspect not in your case.

 

ok so I can just browse the disk files on krusader to make sure everything looks fine and if the data looks correct then I can just rebuild the drive and it will no longer show disabled?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...