Chia farming, plotting; array and unassigned devices


Shunz

Recommended Posts

4 threads per plot, but also only 4 plots instead of 8 simultaneously.

 

I noticed that too about the CPU usage.  I have a theory on that, but I won't have evidence until I run my next test.

One thing I observed is that only 2 of the 4 'chia' processes would ever show 20% CPU at a time. the other 2 would show 6-10%, and then they would flip.  IIRC, when I ran my first 8-plot test I was seeing 11-12% on 4 at a time, and 5-6% on the other 4, then they would flip.  It'll be interesting to see what happens next.

 

2nd temp gets used on phase 3, when the final plot tmp file is created, and phase 4 when the checkpoints are added.  My guess is drive thrashing is minimized.  I was just watching the reported drive throughput on the 'harvester' drive, and it looked like at least a 250MB/s average write rate. I saw it holding north of 350 for a while there, with 100% duty cycle, but I think that's because 2 plots were getting their final move on to that disk.  I also checked memory and there was a couple GB chunk that was tagged basically as disk cache.  I've got 144GB ram, it can use as much as it wants for that. :D

 

My next run will be 8 plots, no -2 drive, 4 threads each. that's technically over subscribing the processors, but I want to see what happens.  So far though, 8 in 24 is faster than 4 in 15.

Link to comment

initial notes on the new 8-plot 4-thread run:

CPU % times for all 8 processes are bouncing from 8-12%, which seems appropriate.

Disk speeds for all 8 disk are peaking over 350MB/s and are able to hold that write rate (SAS 15k DP FTGDMFW :D )

 

F1 compute times have no discernable pattern; 8p/2t was 520s, 4p/4t was 781s, and this 8p/4t was 621s.

 

theory: if it's actually viable to stagger start processes on this hardware, since all 8 don't seem to process at the same rate, for in-deterministic reasons, I could just kick off 8 looping processes, and let them naturally drift apart, sort of like SpaceX's StarLink satellites after deployment. :D 

Link to comment
Just now, sota said:

tons.

 

Yeah, kinda confirms my suspensions that SSD's are far from required to plot. I have some Samsung 845DC Pro SSD's from 2014 with 8TB of endurance each that came in a server I picked up off craigslist awhile back. I just don't know if it is worth using them for plotting though as I am not sure the performance will be much better then doing a whole bunch in parallel with regular drives.

 

Properly setup, I will be limited by drive bays if I decided to really go at this. I have a lot of old small drives laying around that I will never actually use for anything. If they die, who cares.

 

These SSD's though I want to use as my cache on unraid, just got to 3D print a holder for them to fit in the server.

Link to comment

given how much abuse an SSD will take from this (3TB total r/w each plot) it seems a bit silly.

 

I kinda wish I had a machine here with a LOT more ram available... I'd try and set up a 300GB RAM drive. :D

 

From watching the processor loading though, I get the feeling the program has some serious speed humps in place, that prevent it from really cooking along.

Link to comment
1 hour ago, sota said:

given how much abuse an SSD will take from this (3TB total r/w each plot) it seems a bit silly.

 

I kinda wish I had a machine here with a LOT more ram available... I'd try and set up a 300GB RAM drive. :D

 

From watching the processor loading though, I get the feeling the program has some serious speed humps in place, that prevent it from really cooking along.

 

Lol, funny you should mention that, I found some cheap ECC DDR3 the other day on ebay and ordered 256gb of it. Combined with the 128gb I already had here, I could put 384gb into one of my servers. It was hard not to but I didn't want to mess with my main server as it is nice an reliable.

 

From reading though, ram drives only see a ~10% improvement over ssd/nvme, the real limit is single thread CPU speed. Which is why I am surprised how everyone makes it sound like you must use SSD's to plot or it won't work.

 

I suppose on a "normal" computer with 4 cores, you need to maximize each plots speed vs overall plotting speed in parallel. In that case it would make a difference but still not worth it from what I have seen. I mean a single plot doesn't even stress spinning rust.

 

I am in the process of moving all the finished plots around (figure why waste them). Then going to shuffle some drives around and try loading the system down with even more parallel plots.

Edited by TexasUnraid
Link to comment

robocopy "source" "destination" /mov *.plot /mon:1

 

using that little nugget now. :D

 

I also, for giggles, spun up some REALLY decrepit computers.... hp dc7900 and 8000 elite USDT's.

one of them just posted a plot that I started almost 4 days ago. :D

Link to comment

final is "slow" storage comparatively, and the goal would be to return the drives/process back to making more plots.

put the final move in the background basically.  I wish chia had a flag option to leave the final plot in the temp folder, and let people move them manually.

plus the final drive will change as it fills up.

 

 image.png.12d68faff4900ded6bd69307302b24a8.png

 

Link to comment
1 minute ago, sota said:

Related to what we're both discussing casually.

 

 

Yeah, I have preached for years to minimize hard drive usage on drives you care about to extend their life.

 

I have a lot of drives that are 10+ years old and still running good mostly because I don't abuse them.

 

At this point those those drives are old enough that I have accepted I will NEVER use them again, so might as well get some use out of them this way, if they die it is finally an excuse to throw them out lol

 

Any old drives I care about (basically 4tb+) will be for farming not plotting.

Link to comment
7 minutes ago, TexasUnraid said:

Agreed, I also wish it could just leave the files in the temp folder or MOVE them to another folder. Right now it wasting time coping them on the same drive for me to come along and clean up later.

 

Yeah, I think the sweet spot for threads is around 4-6 based on my limited testing.

 Yea if you could get it to leave the file in whatever you place as -2 (since that's where it builds the final file anyways. it's just the same as -t if you don't specify it) then I could dynamically change the rolling script to move to the next storage drive. Or, and this is where unRAID could play a role, load up a bunch of disks into a no-parity array, point -2 to the share/mount for all the disks, let them fill up over however long, pull 'em out, make a new array with new disks, rinse and repeat. :D

Or just keep buying/building 30 disk unRAID boxes :D 

 

The ability to not move the .plot file multiple times would be sweet.

Link to comment

image.thumb.png.c924fe5a09aec6a29a4894283162102b.png

 

nearing the end of phase 3 right now.  that's one of the lower write speed peaks I've seen. 350MB/s isn't rare.

what I still find interesting is the inconsistency of phase completion across processes.  it doesn't seem to trend with any specific disk/excuted thread.  only thing I can think of is, the system's background housekeeping/foreground interface processes are asymmetrically latching on to specific cores/hyperthreads and causing contention.

Phase 3 should end within 30 minutes from now, and I should have the completed data from this run in by 7am, according to my estimates.

 

Link to comment

Here's what I mean by the inconsistency.  In every case, the group of plots was started within a second of each other, via script.

I highlighted the "flyers" in each group.

 

8p2t (lost a log file someplace)
Starting phase 1/4: Forward Propagation into tmp files... Sat May 22 13:45:06 2021
Total time = 85716.351 seconds. CPU (122.120%) Sun May 23 13:33:43 2021
Total time = 85863.223 seconds. CPU (122.130%) Sun May 23 13:36:23 2021
Total time = 78384.231 seconds. CPU (123.310%) Sun May 23 11:31:52 2021
Total time = 84997.331 seconds. CPU (122.100%) Sun May 23 13:22:21 2021
Total time = 77962.652 seconds. CPU (124.000%) Sun May 23 11:25:14 2021
Total time = 85872.535 seconds. CPU (122.010%) Sun May 23 13:38:11 2021

missing

Total time = 84708.527 seconds. CPU (122.320%) Sun May 23 13:18:57 2021

 

4p4t

Starting phase 1/4: Forward Propagation into tmp files... Sun May 23 16:07:06 2021
Total time = 59364.822 seconds. CPU (153.660%) Mon May 24 08:36:30 2021
Total time = 58580.536 seconds. CPU (154.300%) Mon May 24 08:25:34 2021
Total time = 54933.449 seconds. CPU (149.910%) Mon May 24 07:24:47 2021
Total time = 59703.949 seconds. CPU (153.280%) Mon May 24 08:44:17 2021

 

8p4t

Starting phase 1/4: Forward Propagation into tmp files... Mon May 24 09:07:54 2021

Total time = 80690.536 seconds. CPU (127.290%) Tue May 25 07:32:44 2021
Total time = 79739.562 seconds. CPU (128.200%) Tue May 25 07:16:53 2021
Total time = 76036.094 seconds. CPU (131.050%) Tue May 25 06:15:10 2021
Total time = 76768.785 seconds. CPU (130.680%) Tue May 25 06:27:23 2021
Total time = 75913.641 seconds. CPU (131.380%) Tue May 25 06:13:08 2021
Total time = 79979.292 seconds. CPU (128.010%) Tue May 25 07:20:53 2021
Total time = 76380.811 seconds. CPU (130.990%) Tue May 25 06:20:55 2021
Total time = 79589.050 seconds. CPU (127.680%) Tue May 25 07:14:23 2021

 

Actually, now that I look at this on-screen like this, the 3rd and 5th in each group was a flyer.  6th appears to be one of the slowest.

I wonder why the discrepancy.

 

 image.png.95aeb72a5fcaf681e31d930f81828ae2.png

 

 

switching 8p2t to 8p4t shaved 1.45 hours off the total time for 8 plots.

4p4t is the fastest through P1 and P2.

-2 definitely helps P3 and P4 to charge through.

 

Working through some mental math:

If I had an ideal setup, 4 plots 4 threads each, started every 10 hours (to get past P2), 16 hours for the 1st 4 plots to arrive, 4 plots ever 6 hours after that. that's 16 plots every 24 hours. I'd need 8 P1/P2 disks, and probably 4 P3/P4 disks.  I could maybe make the P1/P2 start time every 8 hours, the assumption being that I can steal "time" during P2 for the new P1, but the big problem is NOT affecting the P3/P4 time window.  That's the one that'll determine how fast my plotting can go.

 

Really I need more cores/threads, but unless someone wants to send me a pair of X5680 chips for free, that ain't happening. :D 

 

 

 

Link to comment

Think I'll take some "time off" from plotting, to let the harvester drive clear out (my final plots drive is on a USB 2.0 connection for the moment.  yea, that sucks BIG time.)

The cable to connect the MD1000 should arrive today, so i'd need to take the farm down anyways to install that, after I test it in the other box.

I can also swap out the 6x 146GB drives for 300GB drives, AND since I've figured out how to use the SATA port in the DL380G6 to run a HDD instead of a CD-ROM drive (hint: small adapter) I'm going to try and backup/restore/clone the farm's OS drive from the 2x72GB array on to a small SATA drive, thereby freeing up 2 more slots in the server.  that'll give me 16x 300GB SAS DP 15k to play with.

Link to comment

I noticed some inconsistency between plots but not nearly that bad. Total time was generally within a few minutes of each other, honestly would not expect any better.

 

The biggest conclusion I have come to is that plotting is CPU limited more then anything and parallel plotting to max out the CPU will generally give you the best results as long as you are not pushing past the IO limits of the drive significantly (which as you can see, takes much more then a single plot to do).

 

I also just pulled apart my test rig and shuffled some drives around, also waiting on a cable, gonna try setting up a JBOD with the un-used drive bays on another chassis to add some more disks. Should allow me to max out the CPU with some more drives.

Link to comment

(earlier) well I lied.

decided to kick off a 4p4t (no -2).  want a baseline of what that does solo, to be able to compare to staggered pair of 4p4t.

 

and I just finished testing and populating the MD1000.  works perfectly with an LSI SAS9200-8e controller.  now can see 60TB of raw storage through that. :D

 

Now i need to finalize what machine is going to manage it, and become the Farm.

Thinking this old HP 8000 SFF if I can make it work again.

Edited by sota
Link to comment

had another thought.

what if the slow down on 8 plots, isn't the disks, but the controller getting saturated.

8p4t were all done on the same controller. it's an HP P420 w/ 1GB cache, but what if the traffic was enough to saturate it somehow.  I don't think anyone knows what style I/O Chia is making; small or big chunks basically.  If it's a metric ton of little chunks, that might flood the controller with enough IOPS to cause it to slow down.

 

Now that I have the MD1000 working (60TB raw storage, just waiting to get filled), and the 8000 elite SFF is a viable choice as a Farm Only box, once I move the Farm to it I'll experiment more with splitting plot groups across controllers.  8 300GB disks on each controller, do a pair of 4p4t-2 runs, with the -2 targets being on the opposite controllers.

Link to comment
9 hours ago, sota said:

had another thought.

what if the slow down on 8 plots, isn't the disks, but the controller getting saturated.

8p4t were all done on the same controller. it's an HP P420 w/ 1GB cache, but what if the traffic was enough to saturate it somehow.  I don't think anyone knows what style I/O Chia is making; small or big chunks basically.  If it's a metric ton of little chunks, that might flood the controller with enough IOPS to cause it to slow down.

 

Now that I have the MD1000 working (60TB raw storage, just waiting to get filled), and the 8000 elite SFF is a viable choice as a Farm Only box, once I move the Farm to it I'll experiment more with splitting plot groups across controllers.  8 300GB disks on each controller, do a pair of 4p4t-2 runs, with the -2 targets being on the opposite controllers.

 

Interesting ideal, although seeing the actual IO load to the drives as being fairly mundane compared to a lot of database workloads that these controllers were designed for, I kinda doubt the controller is an issue.

 

On linux I am able to log the IO of the drives, technically I can see the micro-scale IO but I don't really understand those numbers. On the macro scale, it seems to be large chunks as it will read a few hundred mb and then write a few hundred etc.

 

BUT it is possible that the dirty write settings combined with my extra memory are what is allowing it to combine a bunch of small writes into those larger writes.

Link to comment

i've also seen some noise about, for machines that don't have a TON of cores or stupid fast SSDs, instead of throwing MORE threads at a plot, throw LESS, down to a single. it's only the single phase that seems to benefit from having multiple cores, and since throwing cores at it doesn't scale with a whole multiplier in terms of time reduction on that phase, it might make sense.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.