cache pool errors - failing drive or ?? - General Support

June 6Jun 6

Recently, my cache drive has begun to report errors. I thought perhaps the NVMe drive was beginning to fail (Samsung 980 Pro 1TB), but it passes self tests.
I don't want to start a warranty claim process if it's not the drive. Any way to find out what's happening? The error messages aren't very helpful, at least not to me.
See attached screenshots.
EDIT: SMART report added also.
EDIT2: added obligatory diagnostics. ;)

EDIT3/Resolution: it was the drive. Samsung replaced it under warranty.

tower-smart-20260606-1008.zip tower-diagnostics-20260606-1025.zip

Edited June 24Jun 24 by Elmojo
resolution

Quote

June 6Jun 6

Author

crickets....
No one? 😅

Quote

June 6Jun 6

Solution

Your drive is failing. The self-test passing doesn't mean it's healthy – it only checks if the drive responds, not whether all sectors are readable.

Look at the SMART error log: 399+ Unrecovered Read Errors, all pointing to the same LBA range. That's permanent bad sectors. Back up your data now, this can fail completely without any further warning.

Frank

Quote

1

June 6Jun 6

Author

3 minutes ago, Vvei_61 said:
Your drive is failing

Thanks, but I'm not 100% sure I believe that. Browsing the unraid subreddit, it appears that there have been quite a few of us who are suddenly getting "corruption" errors on cache drives/pools since updating to 7.3.
That seems very suspicious to me. It could be that now it's doing a better job of scanning drive health, but having all these drives 'fail' at once, and in the same way, just feels odd.
I will absolutely backup and scan the drive more deeply. I'll also prepare to replace it, just in case.
Speaking of, what's the easiest way to temporarily move everything off the cache and onto the array? Should I just edit the share settings, or is there a better way?
I've been running this machine for several years, but there are still many things for which I'm a total beginner. :)

Quote

June 6Jun 6

The Unraid 7.3 correlation is interesting and worth keeping an eye on. But Unrecovered Read Errors in the SMART log are written by the drive itself, not by Unraid. Software can't generate those entries.

For moving data off the cache: yes, editing the share settings is the easiest way. Set the Use Cache option to "No" or "Prefer Array" on your shares, then run the mover. Everything will transfer to the array automatically.

Quote

1

June 7Jun 7

Community Expert

It is worth pointing out that if you want to use a self-test on a drive then only the extended self-test is really indicative of health. The short test is primarily about testing the electronics - only a small number of sectors are tested.

Quote

1

June 7Jun 7

Author

3 hours ago, Vvei_61 said:
Set the Use Cache option to "No" or "Prefer Array" on your shares, then run the mover. Everything will transfer to the array automatically.

Awesome, thanks. It appears that I may need to do that. See below...

2 hours ago, itimpi said:
It is worth pointing out that if you want to use a self-test on a drive then only the extended self-test is really indicative of health.

Good to know, thanks! I ran the long SMART test and it reported "Completed: failed segments". The report is attached.
It seems there are quite a few "Unrecovered Read Error" entries.
I probably need to initiate a warranty replacement with Samsung. Bugger.

tower-smart-20260606-1955.zip

Quote

June 7Jun 7

Community Expert

5 hours ago, Elmojo said:
"Completed: failed segments"

That means the test failed, and the device should be replaced.

Quote

1

June 7Jun 7

Author

17 hours ago, Elmojo said:
Set the Use Cache option to "No" or "Prefer Array" on your shares, then run the mover. Everything will transfer to the array automatically.

So...I don't have those options. I can change the Primary Storage from 'cache' to 'array', and leave the secondary storage to 'none'.
However, running mover does nothing. No disk activity is observed, and no files are moved off the cache drive.

Quote

June 7Jun 7

Community Expert

If you want mover to do anything you need to have both primary and secondary storage set and the appropriate mover direction.

Quote

June 7Jun 7

Author

1 hour ago, itimpi said:
If you want mover to do anything you need to have both primary and secondary storage set and the appropriate mover direction.

I'm super confused then. I've had it set that way since the beginning, and it's never moved anything from the cache to the array. I assumed it was because the cache hadn't reached the selected fill level.
That also means that the advice given above by Vvei_61: "Set the Use Cache option to "No" or "Prefer Array" on your shares, then run the mover. Everything will transfer to the array automatically." is incorrect, since I don't even have those options.
So what is the proper process for getting the data off this cache so I can replace it? Someone please walk me through the steps. I'm getting concerned I may lose data. The errors are over 1500 now.

Quote

June 8Jun 8

Community Expert

You can use the steps here: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/#findComment-511923

Quote

June 8Jun 8

You need both storages set: Primary = Cache, Secondary = Array, and Move action = Cache → Array. With Secondary set to "none" the mover has nowhere to send the files.

Make sure to do this on all shares that use the failing cache drive, then run the mover.

Frank

Quote

June 8Jun 8

3 minutes ago, Vvei_61 said:
You need both storages set: Primary = Cache, Secondary = Array, and Move action = Cache → Array. With Secondary set to "none" the mover has nowhere to send the files.
Make sure to do this on all shares that use the failing cache drive, then run the mover.

Looking at the top screenshot, your settings actually look correct already - Primary = Cache, Secondary = Array, Move action = Cache → Array.

Did you start the mover manually? It won't run automatically unless scheduled. And did you get any error message when you ran it?

Frank

Quote

June 8Jun 8

Author

7 hours ago, Vvei_61 said:
Looking at the top screenshot, your settings actually look correct already - Primary = Cache, Secondary = Array, Move action = Cache → Array.
Did you start the mover manually? It won't run automatically unless scheduled. And did you get any error message when you ran it?

Yeah, that's why I'm so confused. I have it set as shown, but when I invoke the mover manually, nothing happens. No errors. I get the little unraid 'wave' icon for a moment, then it goes back to the previous screen, like it's done.
It's always done this, so I just assumed that was normal behavior until the cache reached the threshold of fill for when it was supposed to 'spill over' onto the array. If not, then it's never worked correctly since day-one.
How do I troubleshoot this, to figure out why the mover...isn't? lol

Quote

June 8Jun 8

Author

10 hours ago, JorgeB said:
You can use the steps here: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/#findComment-511923

Thank you JorgeB! It seems that the first steps of manually stopping the docker and VM services is key.
After doing that, mover is at least reporting as active. However, it's running very slooowwly. I mean like the transfer rates are in the KBs. O.o
Should I just leave it and see what happens? I hate to have my server offline for so long, especially not knowing if this might be a multi-day process at these speeds....
EDIT: I have 725GB of data to move. At the currently reported speeds, this will be a process of weeks. Something has to change.

Edited June 8Jun 8 by Elmojo
added data total info

Quote

June 8Jun 8

Author

So it appears that the mover was working, it just took a while. It has now moved everything off of cache, except for ONE of my VM folders. The VM manager is disabled, so obviously no VMs are running. Why will this one not move, or how can I find out why it's not? Can I move it manually? If so, how?

I finally got it to move manually.

Now, the lingering issue is that when I restart the docker and VM services, it recreates the appdata folder on the cache. I've confirmed that the appdata share is set to "array" only.
How can I get all the docker containers to stop writing to the cache?

Edited June 9Jun 9 by Elmojo

Quote

June 9Jun 9

Community Expert

Post new diags please

Quote

June 9Jun 9

Community Expert

You need to check the drive mappings for your containers (and docker itself) to ensure there are no references to /mnt/cache/appdata.

Quote

June 9Jun 9

Author

1 hour ago, itimpi said:
You need to check the drive mappings for your containers (and docker itself) to ensure there are no references to /mnt/cache/appdata.

Oh lordy, individually for each container?!
And I guess I'll have to map them all back once I've replaced the cache drive?
This really feels like something that should be done automatically when the share settings are changed...

7 hours ago, JorgeB said:
Post new diags please

Attached...

tower-diagnostics-20260609-0937.zip

Quote

June 9Jun 9

Community Expert

Shares are correct, so it's mostly what Itimpi mentioned.

Quote

June 9Jun 9

Author

12 minutes ago, JorgeB said:
Shares are correct, so it's mostly what Itimpi mentioned.

Well that sucks.
Also, my VMs (some of them) are broken now. The disk image can't be read in the new location (array) so the machine won't boot.
I have no idea what to do about this. One of these VMs runs my security cameras, so it's fairly urgent to get it running again...

Quote

June 9Jun 9

Community Expert

37 minutes ago, Elmojo said:
Oh lordy, individually for each container?!
And I guess I'll have to map them all back once I've replaced the cache drive?
This really feels like something that should be done automatically when the share settings are changed...

Your docker.img and domain.img specify user shares (appdata, domains, system), not specific drives or pools, so it does work automatically if you set those shares to be on the array, and as long as you haven't specified a particular drive or pool for any docker or VM.

Then you set them back to prefer cache to get them moved back as explained at that link.

Quote

June 9Jun 9

Community Expert

Here is a link in the official docs that also discuss this:

https://docs.unraid.net/unraid-os/using-unraid-to/manage-storage/cache-pools/#moving-files-between-a-pool-and-the-array

Quote

June 9Jun 9

Community Expert

2 hours ago, Elmojo said:
And I guess I'll have to map them all back once I've replaced the cache drive?

If you have containers mapped to use /mnt/user/appdata then they will not need changing again as /mnt/cache/appdata is part of /mnt/user/appdata.

Direct mapping to /mnt/cache/? Locations used to be a way of getting better performance because it by-passed the Fuse layer typically involved in user shares. However current releases of Unraid have the ‘exclusive’ option for shares that are all on one device that also bypasses fuse thus achieving the same performance benefit but still use a path that is not device specific.

Quote

cache pool errors - failing drive or ??

Featured Replies

Solved by Vvei_61

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)