sdub Posted October 9, 2020 Share Posted October 9, 2020 (edited) I've been an Unraid user for about a month now... see attached diagnostics. My array consists of 8 WD SATA drives ranging from 4 to 12TB, one 8TB Seagate drive, and one 12TB WD Parity drive. I'm using a single 1TB Samsung NVMe drive for Cache. It's worth noting that the Seagate drive is a freshly shucked STEB8000100, purchased in May 2018. I never had issues with it when it was connected via USB to my Windows box. All SMART diagnostics on all drives show healthy. Write caching to the drives IS enabled. Fix common problems plugin shows no warnings. The migration of all 40TB of my data into my Array went slowly but very smoothly... write speeds of about 80MBps with parity enabled during the entire copy. When the data was copied into the array, "Drive 7" the Seagate drive, didn't end up with any data by chance. The system has been up and running fully for about a week. This week, as new data was being written to the cache drive and moved to the various disks overnight, data started getting written to "Drive 7". My first sign of trouble was that the "mover" was never finishing. Coincidentally, I'd turned on mover logging, so I was able to watch what was going on with "tail -f /var/log/syslog | grep -i move". Several gigabytes would get moved over at full speed, then it would slow to a crawl.... upwards of 7hr for a 4GB file - about 150kBps. I also noticed that my CPU iowait was upwards of 10%. Since there's no way to gracefully kill the mover script, I did a "ps -aux | grep move" and did a "kill -9" on the PID's, in an attempt to do a graceful shutdown and reboot the server. This was not successful... the top level /usr/local/sbin/move script died in the "D" state, requiring a hard reboot. Upon reboot Unraid detected the unclean shutdown and initiated a parity check. The reboot did not magically fix my "Drive 7" IO problems, however... the parity check was claiming it woulstake 306 days, so I canceled it. Overnight, the mover script kicked off again, and is running into the same brick wall. I killed the mover sub-process successfully, but this time didn't try to kill the "move" script. I'm trying to move the files off the drive manually, but it's not going any faster. For what it's worth, I copied a file off another drive, and got the expected 160+MBps on my 10GBE connection. Any suggestions on what to do? yellowstone-diagnostics-20201009-0857.zip Edited October 9, 2020 by sdub Quote Link to comment
JorgeB Posted October 9, 2020 Share Posted October 9, 2020 Disk7 looks healthy, but don't do a parity check and attempt to move data at the same time, do one or the other first. Quote Link to comment
trurl Posted October 9, 2020 Share Posted October 9, 2020 18 minutes ago, sdub said: no way to gracefully kill the mover script 1 Quote Link to comment
sdub Posted October 9, 2020 Author Share Posted October 9, 2020 31 minutes ago, JorgeB said: Disk7 looks healthy, but don't do a parity check and attempt to move data at the same time, do one or the other first. Yes, I stopped the parity check to move the data off once I saw it would be 300+ days to complete Quote Link to comment
sdub Posted October 9, 2020 Author Share Posted October 9, 2020 33 minutes ago, trurl said: Thanks... wasn't aware of that. I hope they add a stop button in the UI in 6.9... seems like it would be pretty straight forward. Quote Link to comment
trurl Posted October 9, 2020 Share Posted October 9, 2020 13 minutes ago, sdub said: Yes, I stopped the parity check to move the data off once I saw it would be 300+ days to complete And is it working OK now? Quote Link to comment
sdub Posted October 9, 2020 Author Share Posted October 9, 2020 No... Drive 7 is still slow as a Glacier and I'm not sure how to fix it, or diagnose what's going on. Quote Link to comment
sdub Posted October 9, 2020 Author Share Posted October 9, 2020 To get back to a working system, what's the best way to get that data off "drive 7"? I was thinking of the following procedure: Removing Drive 7 Let the parity create it virtually Move all of the parity-emulated Drive 7 data off to other drives Reformat the physical Drive 7 in Windows or elsewhere Bring it back into the array, allowing Unraid to clear it and resynchronize it (no data anymore) Go into the share settings and prohibit any share from using "drive 7" Run the diskspeed diagnostics you linked above If I'm able to get it fixed, allow drive 7 to be re-included in the shares. If not, I guess either replace it with a new 8TB HDD or define a new config without it. Does that make sense? Quote Link to comment
trurl Posted October 9, 2020 Share Posted October 9, 2020 There is no point in step 4 (probably some confusion about the meaning of format), and at step 5, it won't clear the disk, it will rebuild it. Even an empty disk is rebuilt since it needs to be made in sync with parity again. If the disk has an empty filesystem, it will be rebuilt with an empty filesystem. Another possibility would be, after step 3, New Config without the disk and rebuild parity. Then you would be protected again, and could proceed however you wanted with testing that disk, and later add it or a new disk (Unraid will clear added disk so parity is maintained). Quote Link to comment
trurl Posted October 9, 2020 Share Posted October 9, 2020 And step 2 is usually called "emulate". Nothing is really "created". It just reads parity and all other disks to calculate the data for the missing disk using the parity calculation. Quote Link to comment
sdub Posted October 9, 2020 Author Share Posted October 9, 2020 6 minutes ago, trurl said: There is no point in step 4 (probably some confusion about the meaning of format), and at step 5, it won't clear the disk, it will rebuild it. Even an empty disk is rebuilt since it needs to be made in sync with parity again. If the disk has an empty filesystem, it will be rebuilt with an empty filesystem. Another possibility would be, after step 3, New Config without the disk and rebuild parity. Then you would be protected again, and could proceed however you wanted with testing that disk, and later add it or a new disk (Unraid will clear added disk so parity is maintained). That makes more sense... new plan: Restart array, marking Disk 7 as "missing", letting the parity emulate the drive Move all of the parity-emulated Drive 7 data off to other drives Restart array, marking Disk 7 as "available", allowing the parity to rebuild it (as empty) Go into the share settings and prohibit any share from using "drive 7" Run the diskspeed diagnostics you linked above If I'm able to get it fixed, allow drive 7 to be re-included in the shares. If not, I guess either replace it with a new 8TB HDD or define a new config without it. Quote Link to comment
trurl Posted October 9, 2020 Share Posted October 9, 2020 If there is a speed problem with the actual disk, and you use the same disk for rebuild, then expect rebuilding to also have a speed problem. Quote Link to comment
sdub Posted October 9, 2020 Author Share Posted October 9, 2020 I ran the diskspeed tool on the drive as an unassigned device and no problems.... >100MBps across the board. Maybe I’ll try a preclear to see if it has any problems. Very strange. Quote Link to comment
sdub Posted October 9, 2020 Author Share Posted October 9, 2020 I suppose the source (cache drive) could have been the problem and not the Seagate SATA destination drive... With the drive still offline I’ll try to rerun the mover. This seemed unlikely as the cache is a brand new Samsung 970 EVO Plus, and it’s been otherwise fine. Quote Link to comment
sdub Posted October 10, 2020 Author Share Posted October 10, 2020 Did as I mentioned previously... removed the drive from the array, used the parity emulation to move the data off. While the drive was out, I did SMART tests and ran DriveSpeed. No problems at all. Threw it back into the array and started the rebuild of that disk. The behavior was very similar... for the first hour, it was full speed... 100+MBps. After that it started slowing, and by hour 4 it was down to 200kBps. I think this drive is just garbage. Never again will I buy a Seagate. Here’s a somewhat related article that gives me deja Vu. https://forums.tomshardware.com/threads/seagate-barracuda-2tb-slow-issue.3443725/ I’m running a preclear on it, just begging it to actually give an error, but I doubt it will. I may pull it out and throw it in my Windows machine to do more testing with CrystalDiskMark or something just to be sure it’s not a bad controller channel or something. Sent from my iPhone using Tapatalk Quote Link to comment
dfill Posted October 10, 2020 Share Posted October 10, 2020 It's an SMR drive. When doing your testing in Windows ensure you are writing enough data to exceed the PMR cache, I'm gonna go out on a limb here and say you'll see the same results (performance going to crap). Quote Link to comment
JorgeB Posted October 10, 2020 Share Posted October 10, 2020 Like mentioned those it's an SMR drive, Seagate SMR drives usually perform OK with Unraid but that particular model has had other bad performance reports before. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.