2 disks disabled with read errors


Go to solution Solved by JorgeB,

Recommended Posts

Hi all,

 

It started with Disk 9 suddenly getting "UDMA CRC Error Count" and it got disabled. I changed the sata cable/port and rebuilt the array. After few hours Disk 7 got disabled because of read errors, and Disk 8 had also 300k+ read errors but wasn't disabled.

 

I think there is something wrong with my setup, as I'm getting a lot of UDMA CRC Errors on different drives from time to time. I started losing believe on it. 

 

I've 14 drives connected to:

- Motherboard 6x Sata ports

- 1x SAS card 2 ports to 8 Sata 

- 1x Sata card with 2 Sata ports

 

Based on your experience, what is your recommended next action?

Thank you.

 

tower-diagnostics-20221009-1751.zip

Link to comment
5 minutes ago, JorgeB said:
Oct 10 23:22:19 Tower kernel: ata10: reset failed, giving up
Oct 10 23:22:19 Tower kernel: ata10.00: disabled

Disk9 dropped offline again, did you replace both cables?

 

Yes I did. Let me replace the whole SATA power cables and try again. I'm using StarTech SATA power splitter

Thank you for your response. If you've any other suggestions for the SATA and power cables, please let me know.

Link to comment
On 10/11/2022 at 12:09 PM, JorgeB said:
Oct 10 23:22:19 Tower kernel: ata10: reset failed, giving up
Oct 10 23:22:19 Tower kernel: ata10.00: disabled

Disk9 dropped offline again, did you replace both cables?

 

I replaced all the splitter cables, check all the connections and restarted the server. I removed all Sata splitters, only use 2 molex to 2x Sata.

Disks 7 and 9 looks they are disabled and shown as "Unmountable disks".

 

Is there any way to rebuild them using the parity?

Do you think it's wise to mount the drives on Windows and copy all the files from these 2 disks before doing the rebuilding?

 

Appreciate your advise. Thank you.

tower-diagnostics-20221015-2337.zip

Link to comment
22 minutes ago, itimpi said:

Parity cannot fix a disk showing as unmountable.  
 

The correct handling of unmountable disks is covered here in the online documentation accessible via the ‘Manual’ link at the bottom of the GUI or the DOCS link at the top of each forum page.

 

What parameters should I use for the "xfs_repair"?

I checked both disks using "-n" and received the below output

 

Kindly note that, one of the disks was rebuilding before is gets disabled again and become unmountable. Would have this corrupt my files?

 

Phase 1 - find and verify superblock...
bad primary superblock - bad CRC in superblock !!!

attempting to find secondary superblock...
.found candidate secondary superblock...
verified secondary superblock...
would write modified primary superblock
Primary superblock would have been modified.
Cannot proceed further in no_modify mode.
Exiting now.

 

tower-diagnostics-20221016-0042.zip

Edited by HAMANY
Disgs
Link to comment

Are you running the repair from the GU/I or the command line?   If the latter what was the exact command you used?   I am checking as many people get the command slightly wrong using the command line.

 

to actually do a repair you remove the -n option, and if it subsequently asks for it you add the -L optionT

  • Thanks 1
Link to comment
3 hours ago, itimpi said:

Are you running the repair from the GU/I or the command line?   If the latter what was the exact command you used?   I am checking as many people get the command slightly wrong using the command line.

 

to actually do a repair you remove the -n option, and if it subsequently asks for it you add the -L optionT

 

I'm running the GUI.

Removed the -n and got the below outputs. Should I proceed with the "-L" ?

 

Disk 7

Phase 1 - find and verify superblock...
bad primary superblock - bad CRC in superblock !!!

attempting to find secondary superblock...
.found candidate secondary superblock...
verified secondary superblock...
writing modified primary superblock
Phase 2 - using internal log
        - zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

 

Disk 9


Phase 1 - find and verify superblock...
bad primary superblock - bad CRC in superblock !!!

attempting to find secondary superblock...
.found candidate secondary superblock...
verified secondary superblock...
writing modified primary superblock
sb root inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 128
resetting superblock root inode pointer to 128
sb realtime bitmap inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 129
resetting superblock realtime bitmap inode pointer to 129
sb realtime summary inode value 18446744073709551615 (NULLFSINO) inconsistent with calculated value 130
resetting superblock realtime summary inode pointer to 130
Phase 2 - using internal log
        - zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

 

Edited by HAMANY
Link to comment
31 minutes ago, JorgeB said:

Yes, it's the only option since the disks cannot be mounted to clear the log.

 

Thank you for your cooperation.

Done for both desks with no errors. The last 2 lines are

Format log to cycle 4.
done

 

I stopped the array then started it in normal mode as stated in the documentation.

The 2 drives still have red x sign near the disk name and the status is "disabled".

The entire drives contents were moved to the "lost+found" share.

 

What should I do next?

 

754825516_unraiderror.jpg.ba8140e492e01ca36b10fcdf75454c29.jpg

 

tower-diagnostics-20221016-1244.zip

Edited by HAMANY
Link to comment

At this point the repair has been run against the ‘emulated’ drive and the physical disabled dtive is untouched.

 

If you look at the contents of the Lost+Found folder do you think you can sort the contents out or not?   Entries being put there means the repair process could not locate their directory entry to give them the correct name.

 

If the contents look like to much work to resolve what is the state of your backups?

 

keep the physical ‘disabled’ drives intact for now as depending on your answers we may recommend different ways forward.

  • Thanks 1
Link to comment
7 minutes ago, JorgeB said:

In that case it might be better to rebuild parity instead, post current diags first.

 

I posted the latest diags in the previous reply after starting the array in normal mode.

 

7 minutes ago, itimpi said:

At this point the repair has been run against the ‘emulated’ drive and the physical disabled dtive is untouched.

 

If you look at the contents of the Lost+Found folder do you think you can sort the contents out or not?   Entries being put there means the repair process could not locate their directory entry to give them the correct name.

 

If the contents look like to much work to resolve what is the state of your backups?

 

keep the physical ‘disabled’ drives intact for now as depending on your answers we may recommend different ways forward.

 

The Lost+Found folder contains folders and files with random numbers.

The folders contain my files with the original naming with the correct file extension ✔️. The files are just renamed to numbers with no extension .

 

I don't have 1:1 backup for all the files, as most of them are available online. I've backups for some personal folders.

 

I will do the follow, and please let me know what do you think.

- I will copy the entire Lost+Found folder to an external drive, it's around (11.5TB)

- Will rebuild parity after your confirmation

- If there any corrupted/missing files, I will restore them from the backups or re-download them.

 

Appreciate your advise.

 

293873672_unraidlost.jpg.b694af51f1c1f4afd4c98f9e7d8b93af.jpg

Link to comment
34 minutes ago, HAMANY said:

I posted the latest diags in the previous reply after starting the array in normal mode.

Yes, sorry, both disks look healthy, with the array stopped, unassign both, start the array, stop the array, then see if both unassigned disks mount with the UD plugin, if yes check that contents look OK.

  • Thanks 1
Link to comment
47 minutes ago, JorgeB said:

Yes, sorry, both disks look healthy, with the array stopped, unassign both, start the array, stop the array, then see if both unassigned disks mount with the UD plugin, if yes check that contents look OK.

 

Thanks. 

Should I copy my data before doing these steps?

Link to comment
1 hour ago, JorgeB said:

No need, that won't change anything for now.

 

One of them is mounting fine (sdm) and I can see my data in the same structure.

The other one is not mounting (sde), I get the below output when I click on the "file system check"

Diags attached.

 

There is a button called "Run with correct flag", should I try it?

 

FS: xfs

Executing file system check: /sbin/xfs_repair -n /dev/sde1 2>&1

Phase 1 - find and verify superblock...
bad primary superblock - bad CRC in superblock !!!

attempting to find secondary superblock...
.found candidate secondary superblock...
verified secondary superblock...
would write modified primary superblock
Primary superblock would have been modified.
Cannot proceed further in no_modify mode.
Exiting now.

File system corruption detected!

 

tower-diagnostics-20221016-1555.zip

Link to comment

If you rebuild the disks you will get exactly what is shown with the emulated disks, with all that lost+found.

 

On 10/16/2022 at 5:53 AM, JorgeB said:

In that case it might be better to rebuild parity instead

If you rebuild parity instead, the drives will have their current contents as seen when you mount them with Unassigned Devices.

  • Thanks 1
Link to comment
2 minutes ago, trurl said:

If you rebuild the disks you will get exactly what is shown with the emulated disks, with all that lost+found.

 

If you rebuild parity instead, the drives will have their current contents as seen when you mount them with Unassigned Devices.

 

How can I choose between rebuilding the disks or parity?

Rebuilding the parity is more suitable for me.

Edited by HAMANY
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.