New Unraid user - One drive is super slow and I don't know how to fix it

sdub · October 9, 2020

I've been an Unraid user for about a month now... see attached diagnostics. My array consists of 8 WD SATA drives ranging from 4 to 12TB, one 8TB Seagate drive, and one 12TB WD Parity drive. I'm using a single 1TB Samsung NVMe drive for Cache. It's worth noting that the Seagate drive is a freshly shucked STEB8000100, purchased in May 2018. I never had issues with it when it was connected via USB to my Windows box. All SMART diagnostics on all drives show healthy. Write caching to the drives IS enabled. Fix common problems plugin shows no warnings.

The migration of all 40TB of my data into my Array went slowly but very smoothly... write speeds of about 80MBps with parity enabled during the entire copy. When the data was copied into the array, "Drive 7" the Seagate drive, didn't end up with any data by chance. The system has been up and running fully for about a week.

This week, as new data was being written to the cache drive and moved to the various disks overnight, data started getting written to "Drive 7". My first sign of trouble was that the "mover" was never finishing. Coincidentally, I'd turned on mover logging, so I was able to watch what was going on with "tail -f /var/log/syslog | grep -i move". Several gigabytes would get moved over at full speed, then it would slow to a crawl.... upwards of 7hr for a 4GB file - about 150kBps. I also noticed that my CPU iowait was upwards of 10%.

Since there's no way to gracefully kill the mover script, I did a "ps -aux | grep move" and did a "kill -9" on the PID's, in an attempt to do a graceful shutdown and reboot the server. This was not successful... the top level /usr/local/sbin/move script died in the "D" state, requiring a hard reboot. Upon reboot Unraid detected the unclean shutdown and initiated a parity check. The reboot did not magically fix my "Drive 7" IO problems, however... the parity check was claiming it woulstake 306 days, so I canceled it. Overnight, the mover script kicked off again, and is running into the same brick wall. I killed the mover sub-process successfully, but this time didn't try to kill the "move" script.

I'm trying to move the files off the drive manually, but it's not going any faster. For what it's worth, I copied a file off another drive, and got the expected 160+MBps on my 10GBE connection.

Any suggestions on what to do?

yellowstone-diagnostics-20201009-0857.zip

Edited October 9, 2020 by sdub

JorgeB · October 9, 2020

Disk7 looks healthy, but don't do a parity check and attempt to move data at the same time, do one or the other first.

trurl · October 9, 2020

18 minutes ago, sdub said:

no way to gracefully kill the mover script

sdub · October 9, 2020

31 minutes ago, JorgeB said:

Disk7 looks healthy, but don't do a parity check and attempt to move data at the same time, do one or the other first.

Yes, I stopped the parity check to move the data off once I saw it would be 300+ days to complete

sdub · October 9, 2020

33 minutes ago, trurl said:

Thanks... wasn't aware of that. I hope they add a stop button in the UI in 6.9... seems like it would be pretty straight forward.

trurl · October 9, 2020

13 minutes ago, sdub said:

Yes, I stopped the parity check to move the data off once I saw it would be 300+ days to complete

And is it working OK now?

sdub · October 9, 2020

No... Drive 7 is still slow as a Glacier and I'm not sure how to fix it, or diagnose what's going on.

trurl · October 9, 2020

sdub · October 9, 2020

To get back to a working system, what's the best way to get that data off "drive 7"? I was thinking of the following procedure:

Removing Drive 7
Let the parity create it virtually
Move all of the parity-emulated Drive 7 data off to other drives
Reformat the physical Drive 7 in Windows or elsewhere
Bring it back into the array, allowing Unraid to clear it and resynchronize it (no data anymore)
Go into the share settings and prohibit any share from using "drive 7"
Run the diskspeed diagnostics you linked above
If I'm able to get it fixed, allow drive 7 to be re-included in the shares. If not, I guess either replace it with a new 8TB HDD or define a new config without it.

Does that make sense?

trurl · October 9, 2020

There is no point in step 4 (probably some confusion about the meaning of format), and at step 5, it won't clear the disk, it will rebuild it. Even an empty disk is rebuilt since it needs to be made in sync with parity again. If the disk has an empty filesystem, it will be rebuilt with an empty filesystem.

Another possibility would be, after step 3, New Config without the disk and rebuild parity. Then you would be protected again, and could proceed however you wanted with testing that disk, and later add it or a new disk (Unraid will clear added disk so parity is maintained).

trurl · October 9, 2020

And step 2 is usually called "emulate". Nothing is really "created". It just reads parity and all other disks to calculate the data for the missing disk using the parity calculation.

sdub · October 9, 2020

6 minutes ago, trurl said:

There is no point in step 4 (probably some confusion about the meaning of format), and at step 5, it won't clear the disk, it will rebuild it. Even an empty disk is rebuilt since it needs to be made in sync with parity again. If the disk has an empty filesystem, it will be rebuilt with an empty filesystem.

Another possibility would be, after step 3, New Config without the disk and rebuild parity. Then you would be protected again, and could proceed however you wanted with testing that disk, and later add it or a new disk (Unraid will clear added disk so parity is maintained).

That makes more sense... new plan:

Restart array, marking Disk 7 as "missing", letting the parity emulate the drive
Move all of the parity-emulated Drive 7 data off to other drives
Restart array, marking Disk 7 as "available", allowing the parity to rebuild it (as empty)
Go into the share settings and prohibit any share from using "drive 7"
Run the diskspeed diagnostics you linked above
If I'm able to get it fixed, allow drive 7 to be re-included in the shares. If not, I guess either replace it with a new 8TB HDD or define a new config without it.

trurl · October 9, 2020

If there is a speed problem with the actual disk, and you use the same disk for rebuild, then expect rebuilding to also have a speed problem.

sdub · October 9, 2020

I ran the diskspeed tool on the drive as an unassigned device and no problems.... >100MBps across the board.

Maybe I’ll try a preclear to see if it has any problems. Very strange.

sdub · October 9, 2020

I suppose the source (cache drive) could have been the problem and not the Seagate SATA destination drive... With the drive still offline I’ll try to rerun the mover.

This seemed unlikely as the cache is a brand new Samsung 970 EVO Plus, and it’s been otherwise fine.

sdub · October 10, 2020

Did as I mentioned previously... removed the drive from the array, used the parity emulation to move the data off.

While the drive was out, I did SMART tests and ran DriveSpeed. No problems at all. Threw it back into the array and started the rebuild of that disk. The behavior was very similar... for the first hour, it was full speed... 100+MBps. After that it started slowing, and by hour 4 it was down to 200kBps.

I think this drive is just garbage. Never again will I buy a Seagate. Here’s a somewhat related article that gives me deja Vu.

https://forums.tomshardware.com/threads/seagate-barracuda-2tb-slow-issue.3443725/

I’m running a preclear on it, just begging it to actually give an error, but I doubt it will.

I may pull it out and throw it in my Windows machine to do more testing with CrystalDiskMark or something just to be sure it’s not a bad controller channel or something.

Sent from my iPhone using Tapatalk

dedwardsdale · October 10, 2020

It's an SMR drive.

When doing your testing in Windows ensure you are writing enough data to exceed the PMR cache, I'm gonna go out on a limb here and say you'll see the same results (performance going to crap).

JorgeB · October 10, 2020

Like mentioned those it's an SMR drive, Seagate SMR drives usually perform OK with Unraid but that particular model has had other bad performance reports before.

New Unraid user - One drive is super slow and I don't know how to fix it

Recommended Posts

sdub

Link to comment

JorgeB

Link to comment

trurl

Link to comment

sdub

Link to comment

sdub

Link to comment

trurl

Link to comment

sdub

Link to comment

trurl

Link to comment

sdub

Link to comment

trurl

Link to comment

trurl

Link to comment

sdub

Link to comment

trurl

Link to comment

sdub

Link to comment

sdub

Link to comment

sdub

Link to comment

dedwardsdale

Link to comment

JorgeB

Link to comment

Join the conversation