Jump to content

Disk issues keep reappearing


Go to solution Solved by Couch,

Recommended Posts

Hello, i am kind of new to troubleshooting on linux based OS's

 

I have had an unraid box running for about 2 years, with little to no issues, but now my disk 8 in my array keeps being disabled,

 

The following steps have been tried:

  • New config
  • new controller/cables
  • replaced disk (i only have very used disks, but plan to replace as soon as i can afford it)
  • reformat disk/preclear it

 

I have tried looking at the logfiles, but i am a little lost to where to find the right ones, and what to look for.

 

Please help and please let me know how i can make helping easier

Screenshot 2024-01-07 150723.png

Edited by Couch
Link to comment
1 hour ago, trurl said:

Wish you had asked before doing anything. Most things you tried were the wrong things to do. 

 

Looks like disk 5 is empty or nearly so. Is it supposed to be like that?

 

Attach Diagnostics to your NEXT post in this thread. 

No worries :) it fixed it when i had the issue last, as mentioned all my disks are heavily used and i have plenty of them

Can you elaborate on it being the wrong steps?

 

Yes, i am nearing my storage cap, and will need to expand soon

 

Attached is my diagnostics

 

Thank you for helping :)

hera-diagnostics-20240107-2006.zip

Link to comment

Is disk5 supposed to be nearly empty? And nothing assigned as disk6?

 

Since you rebooted after disk8 became disabled we can't see anything in syslog about why, but SMART report for disk8 looks mostly OK, except for a large number of UDMA CRC errors, which is probably the reason, a bad connection.

 

Take a look at your Dashboard page. It should be showing you SMART ( 👎) warnings for several of your disks. Post a screenshot of that.

Link to comment

I have reviewed SMART for each of your array disks. Fortunately all their SMART warnings are UDMA CRC, which indicate bad connections. In particular, disks 5, 7, and 8.

 

How are these disks powered? Any power splitters involved?

 

FIrst thing you should do is check all disk connections, SATA and power, both ends, including splitters.

 

Then reboot, post new diagnostics, and we can continue to work on disk8 as well as a few other problems you have.

Link to comment
1 hour ago, Couch said:

Can you elaborate on it being the wrong steps?

 

6 hours ago, Couch said:

New config

This forces Unraid to accept the disks just as they are, and (optionally) rebuild parity.

 

If the disk had truly failed there would have been no way to rebuild it, so all of it's data would be lost. Rebuilding data disks is the whole reason you have parity. If you're going to New Config every time a disk gets disabled, you might as well not have parity at all and forget about recovering any data.

 

And even if the physical disk could still be used with its contents in the New Config, it probably wouldn't agree with parity so if you didn't let parity rebuild the array would be out of sync.

 

And, while a disk is disabled, Unraid is still emulating it from parity. It is possible many files could have been written to the emulated disk. By rebuilding parity instead of rebuilding the data disk, all those writes would be lost since they were not on the physical disk.

 

It's even possible that some of the lost writes would be filesystem metadata, and so could result in corruption of the filesystem of the physical disk.

 

6 hours ago, Couch said:

reformat disk/preclear it

Format is a write operation. If you format a disk while it's still in the array, parity is updated just as with any write operation. So, after formatting a disk in the array, parity agrees that it is a formatted disk with an empty filesystem, so any data it might have had can't be recovered from parity rebuild.

 

If instead you did this outside the array, it was mostly pointless.

 

Assuming you did it in the order stated, formatted then precleared, then the format would be totally pointless since it would be cleared.

 

If instead you precleared then formatted, the only way that would be accepted that way into the array is if you did New Config to force it. If you want a formatted disk in the array, you should format it in the array.

 

Unraid only requires a clear disk when adding it to a new slot in an array that already has valid parity. This is so parity will remain valid since a clear disk (all zeros) has no effect on parity. If you add a disk to a new slot in the parity array, Unraid will clear it if it hasn't been precleared, then you can format it in the array so it can accept files. Clearing a disk has nothing at all to do with disks that are already part of the array, though some will use preclear to test a new disk before using it as a replacement.

 

Link to comment
On 1/7/2024 at 8:32 PM, trurl said:

I have reviewed SMART for each of your array disks. Fortunately all their SMART warnings are UDMA CRC, which indicate bad connections. In particular, disks 5, 7, and 8.

 

How are these disks powered? Any power splitters involved?

 

FIrst thing you should do is check all disk connections, SATA and power, both ends, including splitters.

 

Then reboot, post new diagnostics, and we can continue to work on disk8 as well as a few other problems you have.

Okay, i checked connectors and they all seem to be plugged in right, i am using 1 splitter for 2 disks, but they are not the disks with issues

 

Power-supply is an older used one, so it might have bad connectors

hera-diagnostics-20240109-1029.zip

 

Attached is the new diagnostics :)

 

Again thank you for your help and your guidance

 

Link to comment

Disks 1,2,3,4,7 all mount and have data. Disk5 also mounted but is empty or nearly so. Nothing assigned as disk6. And, disabled/emulated disk8 mounts and has data so that is good.

 

You didn't answer this question:

On 1/7/2024 at 2:20 PM, trurl said:

Is disk5 supposed to be nearly empty? And nothing assigned as disk6?

 

Not clear from your initial description whether or not you let parity rebuild when you did New Config, but I assume parity must be valid since disk8 is being emulated just fine.

 

The contents of emulated disk8 will be the result of rebuilding disk8.

 

Unraid disables a disk when a write to it fails for any reason. This is because it is no longer in sync with the array. After a disk becomes disabled, it isn't used again until rebuilt (or you force it with New Config).

 

Instead, Unraid emulates the disk. Any reads of the disabled disk instead read all other disks and get its data from the parity calculation. Any writes to the disabled disk instead update parity as if the disk had been written. The initial failed write, and any subsequent writes to the disabled disk, can be recovered by rebuilding.

 

If we are reasonably confident everything is working well, usually we will just say rebuild on top of the same disk. But, it might be safer to rebuild to a new disk and keep the original with its existing contents in case of problems rebuilding.

 

Do you have another disk you can use for rebuilding disk8?

Link to comment
55 minutes ago, trurl said:

Disks 1,2,3,4,7 all mount and have data. Disk5 also mounted but is empty or nearly so. Nothing assigned as disk6. And, disabled/emulated disk8 mounts and has data so that is good.

 

You didn't answer this question:

 

Not clear from your initial description whether or not you let parity rebuild when you did New Config, but I assume parity must be valid since disk8 is being emulated just fine.

 

The contents of emulated disk8 will be the result of rebuilding disk8.

 

Unraid disables a disk when a write to it fails for any reason. This is because it is no longer in sync with the array. After a disk becomes disabled, it isn't used again until rebuilt (or you force it with New Config).

 

Instead, Unraid emulates the disk. Any reads of the disabled disk instead read all other disks and get its data from the parity calculation. Any writes to the disabled disk instead update parity as if the disk had been written. The initial failed write, and any subsequent writes to the disabled disk, can be recovered by rebuilding.

 

If we are reasonably confident everything is working well, usually we will just say rebuild on top of the same disk. But, it might be safer to rebuild to a new disk and keep the original with its existing contents in case of problems rebuilding.

 

Do you have another disk you can use for rebuilding disk8?

Very sorry

I do not know why disk 5 has been replaced recently, so i had to move stuff away from it temporarily.

 

Disk 6 missing is on me, i put the disks in 7-8 by mistake, but have not bothered moving them.

 

Yes i let it rebuild every time there is anything, and it runs parity checks weekly

 

 

 

Yes, i will try rebuilding hopefully over the weekend when i have a bit of time :)

 

Thank you very much!

Link to comment
40 minutes ago, Couch said:

disk 5 has been replaced recently, so i had to move stuff away from it temporarily.

No good reason to move data off a disk you are going to rebuild. The whole point of rebuild is so the replacement will have all the data the original had. And if the original disk was actually disabled, there are some good reasons to not move data off the emulated disk, since all the other disks have to get involved emulating the data for the disk.

 

40 minutes ago, Couch said:

it runs parity checks weekly

Most only do monthly or even less frequently. Parity is realtime so check is just a check that parity is still in sync. But 4TB parity check should take much less than a day.

Link to comment
12 minutes ago, trurl said:

And if the original disk was actually disabled, there are some good reasons to not move data off the emulated disk, since all the other disks have to get involved emulating the data for the disk.

If you are concerned for the data and worried about rebuild, then the better approach would be to COPY not MOVE the data somewhere OFF the array. That way you don't modify the original disk, and you don't modify parity and the other array disks while your array is compromised.

 

And if you are concerned for the data, best approach is to have another copy of anything important and irreplaceable on another system. Parity is not a substitute for backup.

Link to comment
22 minutes ago, itimpi said:

This is probably excessive.   More frequent is running them monthly, or even quarterly.

Ah, well i might have to adjust a bit then :)

 

20 minutes ago, trurl said:

No good reason to move data off a disk you are going to rebuild. The whole point of rebuild is so the replacement will have all the data the original had. And if the original disk was actually disabled, there are some good reasons to not move data off the emulated disk, since all the other disks have to get involved emulating the data for the disk.

 

Most only do monthly or even less frequently. Parity is realtime so check is just a check that parity is still in sync. But 4TB parity check should take much less than a day.

Moved due to me now having spare disks at the time, so i had to collect my data between disks.

 

 

 

And i am aware it does not work as a backup :)

 

Thank you ❤️

Link to comment
On 1/9/2024 at 5:15 PM, trurl said:

If you are concerned for the data and worried about rebuild, then the better approach would be to COPY not MOVE the data somewhere OFF the array. That way you don't modify the original disk, and you don't modify parity and the other array disks while your array is compromised.

 

And if you are concerned for the data, best approach is to have another copy of anything important and irreplaceable on another system. Parity is not a substitute for backup.

Just making sure i got these the steps correct here, since my other moves were not the greatest

 

  1. backup data
  2. shutdown array remove bad disk and add other disk,  start array
  3. let it rebuild
  4. shutdown array
  5. create new config and assign disks back where they belong
  6. rebuild array

am i missing something?

 

 

Link to comment

Maybe there is some confusion, could be me.

 

When you assign the disk to the same slot as the removed disk and start the array, the new disk is rebuilt so it contains the contents of the missing disk.

 

Have no idea why you think another New Config and rebuild is needed after that.

Link to comment
On 1/15/2024 at 2:09 PM, trurl said:

Maybe there is some confusion, could be me.

 

When you assign the disk to the same slot as the removed disk and start the array, the new disk is rebuilt so it contains the contents of the missing disk.

 

Have no idea why you think another New Config and rebuild is needed after that.

Ahh i might have misunderstood then, so i just replace the disk and let it rebuild, nothing else?

 

Also nothing has changed yet, as my backup is at 95% currently, and i am waiting for that to finish before making changes

 

Link to comment
  • Solution

Okay, backup done and solution seems to have worked

Steps taken:

 

  1. backed up system and shut it down
  2. replaced unresponsive disk with another one
  3. started up system and ran pre-clear on disk
  4. assigned disk to now empty spot and started array
  5. rebuild disk

 

so far it looks to be solved, thank you so much for the help! ❤️

Edited by Couch
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...