Sub 1 MB/s writes to array. VMs slow. Suspect I have a problem I can't find.


Recommended Posts

This morning I found my cache had filled over night.  Nzbget was downloading a bunch of new files.  The mover was running but had been running all night long.  Checked the syslog and saw errors about BTFS, I didn't save the post but searching pointed me to doing a rebalance of BTFS after clearing some space on the drive.  I did that then did a clean shutdown and reboot.  Start the mover and was getting up to 100MB/s.

Hours later its slowed back to a crawl between 300 and 800 KBps.  Enabled mover logging and its been moving the same 3.3GB file for over an hour so the reported write speeds seem to be accurate.

I had to start work shortly after that, and needed to spin up an Ubuntu VM in Unraid to do some testing.  The install of that VM is taking ages, took 20+ min for the ubuntu installer to start booting.

My appdata and VMs + docker are on my BTFS cache array..... or at least as far as I know they are.  A week or so ago I had replaced an SSD in that array to do so I changed all my shares to not use the cache  let the mover move everything off the cache, then changed appdata, domain, and system shares back to "cache prefer" and ran the mover again to move it back.  Seemed to be running ok for a while now.

phoenix-diagnostics-20201106-1216.zip

I'm hoping some kind soul with more experience than I could take a look at my attached diag files and help point me to a possible solution.

In order to prevent the cache filling due to download / data ingest breaking docker / vms I just ordered a 2TB NVME ssd I'm planning to install and try to use as an Unassigned device just to have the VM + docker etc data on.  (Happen to know of a guide on how to do that correctly I'd appreciate a link, the few references I've found so far looked like they could have been from the early days of v6 and maybe out of date?)

Link to comment

Looks like connection problems with this cache disk:

Nov  6 07:29:24 phoenix kernel: ata3.00: ATA-10: SPCC Solid State Disk, P1601544000000009646, V2.7, max UDMA/133
...
Nov  6 07:30:29 phoenix kernel: ata3.00: exception Emask 0x10 SAct 0x4000000 SErr 0x400001 action 0x6 frozen
Nov  6 07:30:29 phoenix kernel: ata3.00: irq_stat 0x08000000, interface fatal error
Nov  6 07:30:29 phoenix kernel: ata3: SError: { RecovData Handshk }
Nov  6 07:30:29 phoenix kernel: ata3.00: failed command: WRITE FPDMA QUEUED
Nov  6 07:30:29 phoenix kernel: ata3.00: cmd 61/08:d0:40:00:02/00:00:00:00:00/40 tag 26 ncq dma 4096 out
Nov  6 07:30:29 phoenix kernel:         res 40/00:d4:40:00:02/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Nov  6 07:30:29 phoenix kernel: ata3.00: status: { DRDY }
Nov  6 07:30:29 phoenix kernel: ata3: hard resetting link

Not directly related except for the amount of cache space you are wasting.

 

Why do you have 100G docker.img? 20G is usually more than enough unless you have some app misconfigured so it is writing into docker.img instead of to mapped storage. Have you had problems filling docker.img? I have 17 dockers and they are using less than half of 20G docker.img

Link to comment
6 minutes ago, trurl said:

Looks like connection problems with this cache disk:


Nov  6 07:29:24 phoenix kernel: ata3.00: ATA-10: SPCC Solid State Disk, P1601544000000009646, V2.7, max UDMA/133
...
Nov  6 07:30:29 phoenix kernel: ata3.00: exception Emask 0x10 SAct 0x4000000 SErr 0x400001 action 0x6 frozen
Nov  6 07:30:29 phoenix kernel: ata3.00: irq_stat 0x08000000, interface fatal error
Nov  6 07:30:29 phoenix kernel: ata3: SError: { RecovData Handshk }
Nov  6 07:30:29 phoenix kernel: ata3.00: failed command: WRITE FPDMA QUEUED
Nov  6 07:30:29 phoenix kernel: ata3.00: cmd 61/08:d0:40:00:02/00:00:00:00:00/40 tag 26 ncq dma 4096 out
Nov  6 07:30:29 phoenix kernel:         res 40/00:d4:40:00:02/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Nov  6 07:30:29 phoenix kernel: ata3.00: status: { DRDY }
Nov  6 07:30:29 phoenix kernel: ata3: hard resetting link

Not directly related except for the amount of cache space you are wasting.

 

Why do you have 100G docker.img? 20G is usually more than enough unless you have some app misconfigured so it is writing into docker.img instead of to mapped storage. Have you had problems filling docker.img? I have 17 dockers and they are using less than half of 20G docker.img

Hmm I have no idea why that would be 100G.  I do have a very large plex library (running plex in docker) and its not impossible that I misconfigured something along the way.

Is there a good way for me to reset that?

I looks like that is the SSD I recently replaced.  I'll try to stop the array shutdown and check its connection. 

You mentioned cache space I'm wasting... even before I replaced that drive my cache on the main tab of unraid has always listed 743 GB total space.  And double checking with a btfs calculator that seems to be correct.  Are you saying that even though its reporting as "green" in the unraid UI and with 743GB total space no data can be written to that drive?  Also where did you find that in the main syslog?  So I can check after I reseat the disk.

Thanks for the assistance its greatly appreciated.   


 

Link to comment

Well I did a clean shutdown reseated that drive but still see that error in the syslog.   I guess the other thing I can check is the sata cables.

Unfortunately some bloody idiot (me.......) designed this custom case to be the most pain in the ass thing to get access to you can possibly imagine.image.png.d62364e0cc558edb5325092e3ded3cac.png
image.png.f1f549ec4863ca212901cd0c69eea49c.png

See https://linustechtips.com/topic/353971-an-introduction-to-project-egor-the-never-ending-story/

I've since put in a new motherboard and CPU that old SR-2 gave its last breath folding for a cure for COVID :(


ever since I put it back together some things in the "basement" of that case have been.... "wonky" I guess I'm going to have to finally take the time to tear it all apart to try to find a bad SATA cable wish me luck!

Link to comment

Could someone explain a good way to make sure a given bay is connected correctly?  If it wasn't as in this case is there a particular line I could look for in the syslog on boot up?

For example if I reseat all these sata cables for the attached drives.  if I don't see "SError" in the syslog and the drives show up in the unraid UI is that an indication all is good?  Short of the drive being good that is, i suppose it could still have SMART errors etc.

Because its such a nightmare to take this thing a part I want to take a known good drive and check all the sata connections.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.