High Ping Spikes ( sometimes >500ms)

January 14Jan 14

Hi guys,

So recently I’ve noticed that my Plex has been buffering a lot and some of my hosted sites are running really slow, sometimes they even show as "down" even though the server is definitely up.

I checked and saw that my server is hitting over 500ms ping spikes when pinging the gateway. At first I thought it was just a bad network cable, so I changed it out, but the same thing is still happening.

I was looking into it and saw that shfs is using somewhere around 200% CPU. I really don't know what it is and I haven't modified any settings lately. My only recent changes were adding a parity disk and a new M.2 SSD (which is where my appdata is hosted now). Those changes were a few months ago, but this problem just started showing up at the end of 2025.

I’ve been trying to fix it by myself since then with no luck and I still can't figure out what's causing it. I've attached my diagnostics and a screenshot of the spikes.

Any help would be great, thanks!

tower-diagnostics-20260113-2216.zip

Quote

January 14Jan 14

Community Expert

18 minutes ago, VozDeOuro said:
where my appdata is hosted now

Actually your appdata is on multiple pools, as is system

appdata                           shareUseCache="only"    # Share exists on nvme-red, cache-sata-w, cache-sata, cache
system                            shareUseCache="prefer"  # Share exists on cache, cache-sata

You should do extended self-test on cache-sata

Jan 12 03:52:22 Tower kernel: ata2.00: configured for UDMA/133
Jan 12 03:52:22 Tower kernel: sd 1:0:0:0: [sdd] tag#18 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=0s
Jan 12 03:52:22 Tower kernel: sd 1:0:0:0: [sdd] tag#18 Sense Key : 0x3 [current] 
Jan 12 03:52:22 Tower kernel: sd 1:0:0:0: [sdd] tag#18 ASC=0x11 ASCQ=0x4 
Jan 12 03:52:22 Tower kernel: sd 1:0:0:0: [sdd] tag#18 CDB: opcode=0x28 28 00 17 11 04 90 00 00 08 00
Jan 12 03:52:22 Tower kernel: I/O error, dev sdd, sector 386991252 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
Jan 12 03:52:22 Tower kernel: ata2: EH complete
Jan 12 03:52:23 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x80004 SErr 0x0 action 0x0
Jan 12 03:52:23 Tower kernel: ata2.00: irq_stat 0x40000008
Jan 12 03:52:23 Tower kernel: ata2.00: failed command: READ FPDMA QUEUED
Jan 12 03:52:23 Tower kernel: ata2.00: cmd 60/08:98:90:04:11/00:00:17:00:00/40 tag 19 ncq dma 4096 in
Jan 12 03:52:23 Tower kernel:         res 41/40:08:94:04:11/00:00:17:00:00/40 Emask 0x409 (media error) <F>
Jan 12 03:52:23 Tower kernel: ata2.00: status: { DRDY ERR }
Jan 12 03:52:23 Tower kernel: ata2.00: error: { UNC }

187 Reported_Uncorrect      -O--CK   100   100   ---    -    2541

Quote

January 14Jan 14

Author

23 minutes ago, trurl said:
Actually your appdata is on multiple pools, as is system

it's jsut some leftovers, I will finish move everything to the new nvme-red

23 minutes ago, trurl said:
You should do extended self-test on cache-sata

I ran the smart test on the failing drive as well

do you think this will be related to the high ping values?

tower-smart-20260114-0131.zip

Quote

January 14Jan 14

Community Expert

4 minutes ago, VozDeOuro said:
I ran the smart test on the failing drive as well

# 1  Extended offline    Completed: read failure       00%     41902         401213152

Replace ASAP

Quote

January 14Jan 14

Author

10 minutes ago, trurl said:
Replace ASAP

thx, did you saw anything that could be causing the ping spike ?

Quote

January 14Jan 14

Community Expert

15 minutes ago, VozDeOuro said:
thx, did you saw anything that could be causing the ping spike ?

He told you the reason already.

Your faulty drive is causing delays because it resets all the time and all other processes are halted during the reset.

Quote

January 14Jan 14

Author

8 minutes ago, MAM59 said:
He told you the reason already.
Your faulty drive is causing delays because it resets all the time and all other processes are halted during the reset.

11 minutes ago, MAM59 said:
He told you the reason already.
Your faulty drive is causing delays because it resets all the time and all other processes are halted during the reset.

Ohh ok, is there a way to disable the drive for now by software? Since I don't have money to replace it ATM

Quote

January 14Jan 14

Community Expert

would not help, the resets block the bus (and therefor the whole system) if addressed or not.

You need to take it out physically (but first try to move away the data from it (could take a Looooooooooooooooooong time...)

Quote

January 14Jan 14

Community Expert

9 hours ago, VozDeOuro said:
disable the drive for now by software

9 hours ago, MAM59 said:
try to move away the data from it

Looks like cache-sata-w pool has plenty of space. You could just move it all there and get rid of cache-sata pool completely.

Quote

January 14Jan 14

Author

41 minutes ago, trurl said:
Looks like cache-sata-w pool has plenty of space. You could just move it all there and get rid of cache-sata pool completely.

The problem is the cache-SATA-W is a normal SSD, it's not a server-grade one, is that a big problem like I think it is?

Quote

January 14Jan 14

Community Expert

5 minutes ago, VozDeOuro said:
The problem is the cache-SATA-W is a normal SSD, it's not a server-grade one, is that a big problem like I think it is?

Why do you think it is?

Looks like it is nvme, unlike your cache-sata and nvme-red which are M.2 SATA

Quote

January 14Jan 14

Community Expert

11 minutes ago, VozDeOuro said:
The problem is the cache-SATA-W is a normal SSD, it's not a server-grade one, is that a big problem like I think it is?

No, you dont need server grade hardware to run unraid. I run WD Blue SATA SSDs for my cache pool. However I have them running in mirrored pairs for redundancy if/when one of them fails.

Ideally you want components rated for your expected workloads but the only thing that may happen is you wear the drive out sooner rather than later.

Move the data ASAP or you risk losing it entirely. That drive is about to die. Your last concern should be whether or not the other disk is server grade.

Edited January 14Jan 14 by MowMdown

Quote

January 15Jan 15

Author

I removed the faulty drive, and the high ping issue is still happening.
What else could it be ?

Edited January 15Jan 15 by VozDeOuro

Quote

January 15Jan 15

Author

here is the newerest diagnostics, if it helps

tower-diagnostics-20260114-2133.zip

Quote

January 15Jan 15

Community Expert

can be anything because you are using a wireguard tunnel (dunno why ?) this can also be a result of the tunnelserver on the other end having hickups, your isp, or failing dns queries...

Quote

January 15Jan 15

Author

1 minute ago, MAM59 said:
can be anything because you are using a wireguard tunnel (dunno why ?) this can also be a result of the tunnelserver on the other end having hickups, your isp, or failing dns queries...

The ping test is on the server to the local gateway, the VPN is just to me to access the LAN to access the server to test what is happening.

so the high ping is local

From server 192.168.0.10 to gateway 192.168.0.1

Quote

January 15Jan 15

Community Expert

sorry, cannot see any local reason in your diagnostics anymore.

Quote

January 15Jan 15

Author

3 minutes ago, MAM59 said:
sorry, cannot see any local reason in your diagnostics anymore.

Do you have any idea on what type of logs or processes I can look to help find a diagnostic?

Quote

January 15Jan 15

Community Expert

no clue. you are running a lot of interpreter stuff (Java, Python), they are invisible to the system. Python takes the largest amount of cpu time, but this does not automatically prove it as the culprit.

It must be something that is able to block the whole system for a certain time (for instance big writes that overwhelm the internal buffers or something).

But because they are only visible for the split of a second, they logs do not catch such events.

Sticking to networking I would suspect missing/wrong Flow Control, but you are running at 1G speed only, Flow Control is not necessary there.

To be safe I would disable the WiFi chip in the BIOS so UNRAID does not see him and maybe tries to initialize it (which also costs time and hickups). But this is a very far away guess and try. I dont think this could be the reason. (but then, just try and see. does not harm anything)

Quote

January 15Jan 15

Author

1 minute ago, MAM59 said:
no clue. you are running a lot of interpreter stuff (Java, Python), they are invisible to the system. Python takes the largest amount of cpu time, but this does not automatically prove it as the culprit.
It must be something that is able to block the whole system for a certain time (for instance big writes that overwhelm the internal buffers or something).
But because they are only visible for the split of a second, they logs do not catch such events.
Sticking to networking I would suspect missing/wrong Flow Control, but you are running at 1G speed only, Flow Control is not necessary there.
To be safe I would disable the WiFi chip in the BIOS so UNRAID does not see him and maybe tries to initialize it (which also costs time and hickups). But this is a very far away guess and try. I dont think this could be the reason. (but then, just try and see. does not harm anything)

I can try running a diagnostic when the pings are high too, but it's so random. Something happens a lot in a few minutes, and then nothing for an hour.

I've always run those Java and Python apps; they never caused me any problems. Over the lifespan of 6 years, this issue that is happening is pretty recent, so I don't believe it is those.

I will try to collect more relevant evidence.

Quote

January 15Jan 15

Community Expert

1 minute ago, VozDeOuro said:
I can try running a diagnostic when the pings are high too, but it's so random. Something happens a lot in a few minutes, and then nothing for an hour.

Yeah I was afraid that it happens that way.

I did not say Java or Python are bad, it is just that these languages allow constructs that may cause a really heavy demand for memory which results into a "system shock" to free up enough memory to be able to satisfy the demand. And a second later, the memory is released again. Very simple statements like "a = b" may trigger this because b can be a gigantic array with subarrays and so on. This is programmer stuff, but often even programmers have no clue what they are doing :-)

Anyway, this is nothing you can do anything against.

You could only turn off these apps/dockers one by one and watch out if the pings still are bad or if they are stable then.

Once you have found the bad one, there maybe a chance to fix it. Dunno.

Anyway, could be a loooong and depressing search, sorry.

Quote

January 15Jan 15

Author

2 minutes ago, MAM59 said:
Yeah I was afraid that it happens that way.
I did not say Java or Python are bad, it is just that these languages allow constructs that may cause a really heavy demand for memory which results into a "system shock" to free up enough memory to be able to satisfy the demand. And a second later, the memory is released again. Very simple statements like "a = b" may trigger this because b can be a gigantic array with subarrays and so on. This is programmer stuff, but often even programmers have no clue what they are doing :-)
Anyway, this is nothing you can do anything against.
You could only turn off these apps/dockers one by one and watch out if the pings still are bad or if they are stable then.
Once you have found the bad one, there maybe a chance to fix it. Dunno.
Anyway, could be a loooong and depressing search, sorry.

I was able to get two diagnostics just now lol. I just had two massive spikes, so I started the diagnostic as soon as the ping started spiking.
As you can see the ping goes very high making a lot of stuff thinking its down

tower-diagnostics-20260115-0141.zip ping.txt tower-diagnostics-20260115-0140.zip

Quote

January 15Jan 15

Community Expert

yeah, I am aware that it happens, but I cannot see anything in UNRAID that may cause it.

Quote

High Ping Spikes ( sometimes >500ms)

Featured Replies

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)