"Disk X in error state (disk dsbl)" during shrinking of array - throwing errors - best course of action?


Recommended Posts

'Evening folks!

Been shuffling data around in the array to be able to shrink it and remove three 10+ years old HDDs. 
Following the tried and true "Remove Drives Then Rebuid Parity" method.

 

  • Moved what I needed to keep elsewhere in the array, then cleared all the drives of data.
  • Stopped the array
  • Tools --> New Config --> Retain current configuration (All) --> Confirm yes
  • Started the array without checking "Parity is already valid".
  • It started rebuilding the parity

 

After only a few minutes Disk 6 started spitting errors and it just kept rising until I paused the parity re-build after a couple of more minutes.
 

1804331309_Skrmavbild2023-12-07kl_22.19_22.thumb.png.368c76c3602163d7af60d08e2029c4dd.png

 

The HGST drive is new in the array since a couple of days and was added as a replacement of a smaller fully functioning data drive.

 

It is however a refurbished drive and newly bought, but dated 2017/2018. Passed pre-clear before I added it to the array.

I suspect the SATA cable or power splitter is the culprit but can't be sure. It could be the SATA power splitter, although a bit unlikely, since this new HGST drive is more power demanding than the previous Seagate it replaced.

 


- Can I in this stage of the shrinking process "safely" (I know I no longer have valid parity) power down the server and do some changing of cables? Physically remove the now unassigned old array disks?

- When I after this have rebooted the server, should I do a short SMART self-test? Since I no longer have a valid parity I can't start the array without the drive either and conduct another pre-clear on it. So SMART tests is basically what I could do.
 

Any advice for me here? Am I missing something? What is my best course of action at this point forward (other than restoring from backup)?

 


Note:

  • Drives are connected via SAS->SATA breakout cables to an Adaptec ASR-71605 in HBA mode.
  • Since unRAID has disabled the drive I can't see it in the maxView Storage Manager for the Adaptec card either. 
  • unRAID OS is 6.12.4

 

 

All the best! /Yivey

define7-diagnostics-20231207-2209.zip

Link to comment
12 hours ago, Yivey_unraid said:

- Can I in this stage of the shrinking process "safely" (I know I no longer have valid parity) power down the server and do some changing of cables? Physically remove the now unassigned old array disks?

Yes.

 

You would have to start rebuilding parity from the beginning.

 

12 hours ago, Yivey_unraid said:

When I after this have rebooted the server, should I do a short SMART self-test? Since I no longer have a valid parity I can't start the array without the drive either and conduct another pre-clear on it. So SMART tests is basically what I could do.

No harm, but will probably not help if the cabling is the problem as the test is completely internal to the drive and does not talk to the server while carrying it out.  If you actually want to test the drive then the Extended SMART test is the one you want (which takes MUCH longer), and if it fails that I would think it was enough for a RMA.

 

12 hours ago, Yivey_unraid said:

It could be the SATA power splitter,

 

If using SATA->SATA power splitters do not spilt more than 2 ways as the master SATA connector is limited in the max current it can draw without voltage saga occurring.    If using Molex->SATA you can normally spilt 4 ways as the pins are much more robust.

Link to comment

Thank you for your answers!

 

13 hours ago, itimpi said:

If using SATA->SATA power splitters do not spilt more than 2 ways as the master SATA connector is limited in the max current it can draw without voltage saga occurring.    If using Molex->SATA you can normally spilt 4 ways as the pins are much more robust.

 

It's 4-way splitters directly attached to the 6-pin output on the PSU. Original Seasonic cables. Not ideal, but not terrible either. 

 

12 hours ago, JorgeB said:
Dec  7 22:04:54 Define7 kernel: sd 1:1:14:0: Device offlined - not ready after error recovery

 

Disk dropped offline, check/replace cables, power and SATA.


Looks like it actually was the drive that was the problem. I think.. Restarted the server after moving the cabling around and still got multiple errors. See attached SMART report. Now, perhaps those CRC errors and Current Pending Sector errors might have been a result of the first initial problem, and now causing the drive to behave irrational. I don't know.

At first it didn't show at all when connecting it over the HBA. Put it in an external USB enclosure and that made it show up but with no FS or possibility to mount. Added it back to the server and tried it with a SATA cable directly connected to the MB and that worked. Somewhat..
The drive wouldn't mount normally, and after googling the error messages from the syslog I tried mounting it RO in UD and that finally worked.

Now I'm copying the 4-5TB data off the drive to an SSD. Figured that would still be faster than restoring from backup. (I really need to setup some cataloging system over what files are in what drive and keep that list somewhere, been on my mind for a long time and that would save so much time in the event of a failure.) It keeps coming more current pending sectors errors during file transfer as well.

Will not re-deploy the drive again, if ever, until I've run multiple more tests. Maybe I should just call it DOA and return it.

I have another of the same drive bought together with the failing one. A bit hesitant to deploy that now too. But that is also pre-cleared at least.

 

 


Now, regarding the array. Would it be best now, since I've got no parity, to just not assign any drive to that spot and let it rebuild parity? Or doesn't that work since one device would be missing anyway and it would have no valid parity for it? 
Should I instead do a New Config again following the same steps as my first post and not assign any thing to Disk 6?
In that case I can expand the array with the other HGST 10TB, after parity is rebuilt and then transfer the data from the SSD.

Thoughts?


623864070_Skrmavbild2023-12-09kl_01_14_14.thumb.png.e943f7b27bb63c9d6c85c73fe13fdb79.png
 

HUH721010ALE601_2TGD77RD-20231209-0043.txt define7-diagnostics-20231209-0120.zip

Link to comment
8 hours ago, Yivey_unraid said:

It's 4-way splitters directly attached to the 6-pin output on the PSU. Original Seasonic cables. Not ideal, but not terrible either. 

In that case is it not only the ends that are plugged into the drives that are SATA - the other end being the PSU specific connector?  If that is the case then that is not what I would normally call a power splitter as the cables that come with the PSU will be rated for the required current.

Link to comment
8 minutes ago, itimpi said:

In that case is it not only the ends that are plugged into the drives that are SATA - the other end being the PSU specific connector?  If that is the case then that is not what I would normally call a power splitter as the cables that come with the PSU will be rated for the required current.

Yes, that's correct! Maybe wrong wording on my count there. Seems as it wasn't the power cables anyway.

Any take on this?

  

8 hours ago, Yivey_unraid said:

Now, regarding the array. Would it be best now, since I've got no parity, to just not assign any drive to that spot and let it rebuild parity? Or doesn't that work since one device would be missing anyway and it would have no valid parity for it? 
Should I instead do a New Config again following the same steps as my first post and not assign any thing to Disk 6?
In that case I can expand the array with the other HGST 10TB, after parity is rebuilt and then transfer the data from the SSD.

 

Edited by Yivey_unraid
Link to comment
8 hours ago, Yivey_unraid said:

Should I instead do a New Config again following the same steps as my first post and not assign any thing to Disk 6?
In that case I can expand the array with the other HGST 10TB, after parity is rebuilt and then transfer the data from the SSD.

Thoughts?

This would be the way to go as you do not want to retain any content from disk6.    Note that if you use New Config you can add the new 10TB drive immediately to the array and build parity based on it being present.   It would still need to be formatted after adding it to make it ready for use, but this can be done at any point (even while building parity).

  • Thanks 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.