stephen_m64

November 19, 2022

Bit of a necropost but I have a similar setup and would like to see the best practice method for restarting the webgui for cert updates as well.

December 13, 2019

14 hours ago, johnnie.black said:

SATA port multipliers are usually nothing but trouble and not recommended, but some can work reasonably well.

The m.2 adapter is pci-e gen3x2 sata controller, not a port multiplier.

3 hours ago, mrbilky said:

Hmm will have to follow his channel to see if he reports problems he just uploaded this the other day so time will tell

JMicron makes ok chipsets, this adapter is likely fine, how well it holds up with time/heat would be interesting to see.

August 2, 2019

I have a similar connection issue with my ups and get that error as well.

It only works for me when the ups is plugged in from the start of a full cold boot. Warm reboots will get that error.

August 2, 2019

On 7/2/2019 at 10:45 AM, binhex said:

Check out the odroid N2 it's faster than the raspberry pi 4 and has 0 thermal issues although it is more expensive

Sent from my EML-L29 using Tapatalk

I'll second the odroid N2 option.

I have every sbc you can think of and the N2 is one of my favs, it's a great board. decently powerful, decent io, supports emmc.

I did grab two pi4's though, it's a good upgrade finally for the pi scene, but as mentioned you need good cooling, check out the pi4 flirc case as a sleek looking passive heatsink case, it's quite good.

August 2, 2019

Nice you got your issue solved at least.

It would have been nice if trim solved the issue for me, i tried it early in the process of debug a few different ways and it just delays the problem slightly.

August 2, 2019

11 hours ago, trurl said:

You may have to edit your syslog yourself and post it. Might not tell us anything, but I didn't notice anything wrong in the rest of the diagnostics.

Attached syslog syslog.zip

August 1, 2019

9 hours ago, itimpi said:

It might be worth mentioning which these were? The diagnostics are meant to be anonymised to remove this type of information.

Removed two files; Syslog had more private info in it since I have some more verbose debugging messages enabled.

Lsof; had some Network connections and ip addresses that need not be public.

August 1, 2019

Fast is subjective, the speeds of the drive are decent for the price and provide speeds faster then the platter drives.

If i had the money i would have truly fast nvme drives. but that's not here nor there.

No disrespect but please have something constructive when replying in this thread please.

As requested above the debug logs are attached. i did remove a few files containing sensitive info found.

tower-diagnostics-20190801-0115.zip

July 31, 2019

1 hour ago, Benson said:

Problem relate SSD performance, depends on how much read or write throughput and I/O respond time when drive in busy.

The performance of the ssd's are not really in question, they are cheaper sata ssd drives, yes, but still way faster then any spinning disk media, especially for simple workloads like a sequential write.

The system should not crumble and become unresponsive with large writes, as stated above this did not happen previously.

July 31, 2019

I already looked at the smart data for the drives and other then the vendor unique smart data tables, things such as reallocated sectors and the media wear out indicator tables look to be fine, the drives are actually quite new upgrades from smaller drives; with about 3.4 months of power on hours.

Even with a trim on the drive with it near empty (the most optimal state for writes on the drive) the issue is still seen albeit takes a short bit longer to appear, this is mentioned in above posts.

I hate having to take the system down over and over again but i plan to pull the drives and test them individually for failure time permitting.

July 31, 2019

5 minutes ago, trurl said:

Depends. We haven't seen your diagnostics, and you haven't mentioned whether or not you are completely filling cache faster than the data can be moved off. Since you haven't mentioned it, I assumed you didn't know, and we would have to wait on your diagnostics to fill in the details.

I have tested with the cache drive at normal levels of use 60% full and tested with the cache drive at near empty around 60gb. with and without trim, and still see the issue. (post trim the issue will still happen but takes longer to appear as the drives written blocks are taken up.)

happy to answer any questions, and i will get the zip file posted tonight.

Note: I work in the storage industry, R&D of solid state drives. I have a good bit experience with this area, just don't know how/why this issue has appeared in unraid all of a sudden.

July 31, 2019

True, and transfers to the array directly seem to be fine. But the entire point of 10Gbe is to reduce bottlenecks of 1Gbe with multiple users and allow for the quick offload of large files to the cache so it can move the large files at its leisure.

Working with or moving files at near native ssd speeds is a featured goal. (And looked to work well/fine for about two weeks)

You wont "question the wisdom of caching" after transferring +32gb files over 10Gbe to ssd's vs 1Gbe on spinning disks. 😂

July 31, 2019

Hello.

Started to get an issue similar in scope to the issues users were having here, more specifically the issue i am having started with large writes to the cache drive started to slow down to a crawl or near stop and cause dockers and vm's to become unresponsive (the unraid ui often still works)

Everything was working fine for a week or two after one of the last changes i made being the addition/upgrade of 10Gbe. large transfers were fast and fantastic and well worth the upgrade but now things slow to a crawl then dockers and the vm's will freeze or possibly crash while the cache disk(s) attempt to write the large data. iowait crawls up to a very large value of like 70% as reported by netdata/top.

The 10Gbe change does not seem to be at fault since even transfers limited at 1Gbe will still cause the issue albeit not as fast.

Debug steps i have attempted so far with no positive results:

Checked logs, Nothing outstanding seen in relation to drive read/write errors or health issues seen
Converted from the original cache drive setup: brtfs raid 1 to single drive XFS
Checked Ethernet cables
Checked Sata cables
Attempted to run trim on drives

History:

Upgraded from 6.6.6 to 6.7.2 shortly after release; had no issues
Two weeks ago, Upgraded to 10Gbe, no issues until now.

Hardware setup:

Intel i7 4790k
Asus Z97M-PLUS
32GB Ram
LSI 9211-8i (Latest IT-Mode firmware, used with array drives)
1x Parity drive
4x Drives in array
1x unassigned random 512gb ssd for scratch use and other vm's
(originally) 2x Sandisk SSD Plus 1TB's in brtfs raid1 for cache
(now) 1x Sandisk SSD Plus 1TB XFS for cache
10Gbe Aquantia AQtion 10G Pro NIC, (AQN-107)

Any help would be highly appreciated, i will try to post the debug data when i can tonight.

Its very odd and confusing how everything was working fine up until now and for things to go bad with no outward changes or visible errors. and i'd like to go back to using a parity backed cache if possible.

stephen_m64

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by stephen_m64

How to gracefully restart nginx after SSL certificate update

PCI-e Sata expansion card

UPS Lost communcation

Raspberry PI4 - who else is getting one?

Large writes to cache drive causing unresponsive behavior

Large writes to cache drive causing unresponsive behavior

Large writes to cache drive causing unresponsive behavior

Large writes to cache drive causing unresponsive behavior

Large writes to cache drive causing unresponsive behavior

Large writes to cache drive causing unresponsive behavior

Large writes to cache drive causing unresponsive behavior

Large writes to cache drive causing unresponsive behavior

Large writes to cache drive causing unresponsive behavior