Chia farming, plotting; array and unassigned devices


Shunz

Recommended Posts

Also, something else just struck me.

Since i'm using Windows 10 still, I realized Defender could be getting in the way of performance, if it's trying to intercept activity and analyse it.  Turning it off seems like a good idea, but then I remembered they added Ransomware protection not too long ago; preventing a bad actor from screwing up the plots sounds like a good idea. So my 40w light bulb idea/question would be... can the plots be on a read-only location?

Link to comment
1 hour ago, sota said:

Also, something else just struck me.

Since i'm using Windows 10 still, I realized Defender could be getting in the way of performance, if it's trying to intercept activity and analyse it.  Turning it off seems like a good idea, but then I remembered they added Ransomware protection not too long ago; preventing a bad actor from screwing up the plots sounds like a good idea. So my 40w light bulb idea/question would be... can the plots be on a read-only location?

 

I would just ditch windows long term, Most report a 10-15% improvement in plotting speed on linux and I saw another test that compares NTFS, EXT4 and BTRFS and that should another 10-15% improvement moving to BTRFS IIRC.

Link to comment
Just now, TexasUnraid said:

 

I would just ditch windows long term, Most report a 10-15% improvement in plotting speed on linux and I saw another test that compares NTFS, EXT4 and BTRFS and that should another 10-15% improvement moving to BTRFS IIRC.

That might be my long term plan, but i'm not a *nix guy... I muddle along as best I can with unRAID and my VyOS router... so i'm going to stick with windows farmers and harvesters for now.

Link to comment

Yeah, I just started using daily linux last year myself. I won't be daily driving it anytime soon but for set it and forget it things like this it works well. Course I plan to use that GUI docker posted earlier to make things way easier.

 

Right now I am just playing with chia, I see no point to go hardcore until pools are released and real world earning can be seen.

Link to comment

same boat here.

all the gear i'm using right now, is either old/busted/retired stuff, or a spare/backup server for my primary hosting box.

I did order a controller and cable ($50) to be able to attach this old Dell MD1000 chassis (15 LFF slots) to a box, and i'll be filling it with 4TB drives, since I have a large stash of them.

if this pans out as something potentially viable, then great. if not, i'm out $50 :D

 

Link to comment
7 minutes ago, sota said:

same boat here.

all the gear i'm using right now, is either old/busted/retired stuff, or a spare/backup server for my primary hosting box.

I did order a controller and cable ($50) to be able to attach this old Dell MD1000 chassis (15 LFF slots) to a box, and i'll be filling it with 4TB drives, since I have a large stash of them.

if this pans out as something potentially viable, then great. if not, i'm out $50 :D

 

 

Yep, I bought a few things but they were already in my cart for the last 6 months and when I noticed prices skyrocketing and availability plummeting the other day realized I better buy them now or miss out all together.

 

Take a a look at ebay sometime, supermicro 846 chassis that used to go for $200? Now the few that get listed go for $1500++ and they SELL! Almost enough for me to consider selling my backup chassis if I thought I could replace it.

 

Everything I got so far expect the 146gb 10k drives was stuff I needed/wanted anyways, so worst case for me, I am out $15 lol

Link to comment

I need to reboot my farming machine in a little while, and tear apart the arrays.  Anyone know what happens if a plotter machine can't connect to the farm's network share when the time comes for it to move the completed plot?  I'm hoping it'll just leave the 100GB plot file on the plotter(s).

Link to comment

Hey all:

I have been searching and reading a ton about timing of harvester/farming including bug reports / github issues with nas times. I am still a little unclear and was hoping someone would take a minute and help me understand it a bit better. Here is an example of my log:

2021-05-21T07:38:26.001 harvester chia.harvester.harvester: INFO 0 plots were eligible for farming 632e3571bf... Found 0 proofs. Time: 0.15400 s. Total 103 plots
2021-05-21T07:38:35.406 harvester chia.harvester.harvester: INFO 2 plots were eligible for farming 632e3571bf... Found 0 proofs. Time: 0.67500 s. Total 103 plots
2021-05-21T07:38:42.863 harvester chia.harvester.harvester: INFO 0 plots were eligible for farming 632e3571bf... Found 0 proofs. Time: 0.15500 s. Total 103 plots
2021-05-21T07:38:50.154 harvester chia.harvester.harvester: INFO 0 plots were eligible for farming 632e3571bf... Found 0 proofs. Time: 0.16802 s. Total 103 plots
2021-05-21T07:39:09.484 harvester chia.harvester.harvester: INFO 1 plots were eligible for farming 632e3571bf... Found 0 proofs. Time: 8.33100 s. Total 103 plots
2021-05-21T07:39:15.320 harvester chia.harvester.harvester: INFO 1 plots were eligible for farming 632e3571bf... Found 0 proofs. Time: 7.28700 s. Total 103 plots
2021-05-21T07:39:24.514 harvester chia.harvester.harvester: INFO 0 plots were eligible for farming 632e3571bf... Found 0 proofs. Time: 0.17500 s. Total 103 plots
2021-05-21T07:39:25.875 harvester chia.harvester.harvester: INFO 0 plots were eligible for farming 632e3571bf... Found 0 proofs. Time: 0.15203 s. Total 103 plots

 

What concerns me is that sometimes when I have plot(s) pass a filter, the time goes up. For example, the first hit with 2 plots is good at .675000s but you can see a bit later I have 1 plot pass but time is 8.33100.

I am guessing the time is going up because right now, I am farming on a machine with my plots share mounted via smb.

 

Question 1: Am I correct in thinking this is all fine given that is still under 30sec but greater than 2?

Question 2: Let's say I get lucky and one of my plots may have a proof. Does that full plot have to be transferred via the smb mount? I am pretty confident I can't transfer 100gb in under 30 sec with my network setup from the unraid array.

 

I am really just trying to determine if I need to stop farming remotely given these times and just farm directly on the unraid server via a docker container. 

 

I know a bunch of us are off creating plots and storing them on our array, so just trying to figure out if we even have a chance at winning given the speed of unraid reads.

Link to comment
On 5/21/2021 at 9:43 AM, sota said:

I need to reboot my farming machine in a little while, and tear apart the arrays.  Anyone know what happens if a plotter machine can't connect to the farm's network share when the time comes for it to move the completed plot?  I'm hoping it'll just leave the 100GB plot file on the plotter(s).

 

Pretty sure they will just stay where they are. I know it copies the plots and doesn't move them. I have it set to put the finished plots on the same drive in another folder but it still copies the data instead of moving it.

Link to comment

Just got the DDR3 I ordered off ebay, decided to toss it in the backup server to test it and try plotting with it before putting it into my main server. Doing the 4x4tb raid0 with 8x plots again, only change is 224gb vs 64gb of ram. I set it to 6 cores per plot everytime.

 

I just started it but already noticing much higher CPU usage this time around.

 

I still have the dirty writes set to 90%, I can see it using the full 200gb of cache space in memory as well, I am guessing that is where the gains are coming from (anything overwritten in cache doesn't have to be written to disk twice).

 

Curious what the final results will be if it continues to help or just hitting the ground running type of deal.

Link to comment

I went the other way, and tore down my 8x300GB RAID0 array.  I started an octet of plots around 1:45pm, each having a different drive for the tmp files, but a final target of the 6x146GB RAID0 array still.  I like how CPU time listed for each chia process, is almost 15 hours now, yet the actual time it's been running is closer to 10 hours.  Must be getting some parallel processing in there some where. :D

Link to comment
12 hours ago, sota said:

I went the other way, and tore down my 8x300GB RAID0 array.  I started an octet of plots around 1:45pm, each having a different drive for the tmp files, but a final target of the 6x146GB RAID0 array still.  I like how CPU time listed for each chia process, is almost 15 hours now, yet the actual time it's been running is closer to 10 hours.  Must be getting some parallel processing in there some where. :D

 

Interesting, what is the usage on each drive? With a single plot I had a lot of idle time (in windows pretty sure it will just show less then 100% usage) on individual drives. I had to go to 3 plots per drive to saturate a single ~170mb/s hard drive (and saw improvements in plot time to back it up).

 

I finished the experiment from last night. Results are quite interesting.

 

same 4x4tb array (technically this latest one is a little slower since the drives are more filled up), only change was going from 64gb of ram to 224gb of total system ram. I also raised the per-plot memory usage to 6gb. In retrospect I should of left that alone. Looking at the memory usage though, I don't think it really used that much more memory for the plots, most of the memory usage was for the cache.

 

Total time with 64gb ram = 20 hours for a 2.5 hour per plot time

 

Total time with 224gb ram = 16.6 hours for a 2.1 hour per plot time.

 

Very interesting results for linux anyways, don't know that windows would see the same gains as the cache works a lot differently. This could be why people see faster plots on linux.

 

In theory with 4x 4tb drives run individually with 3x plots in parallel, I could do 12 plots in 20 hours for a net 1.7 hours per plot.

 

Makes me wounder what my individual drive performance would be like now, it might be faster but since I would be running more total plots in parallel, possible there is not enough ram to go around and the performance is not improved as much.

Link to comment

Since the disks are only 300GB each, I'm limited to 1 plot/drive. Agreed, there's a lot of idle time, and the disks could probably support more than 1 plot at a time in that respect (SAS DP 15k, damn well should be able to! :D ) but the space problem prevents that.

Results are in of the 8 plots:

Longest plot time: 24 hrs

Shortest plot time: 22 hrs

Total data moved: 3TB

That was with all 8 plots started at the same time. no idea if stagger starting them will make a difference, since the temp folder for each plot has a dedicated drive.

 

If I could find some tuning suggestions for high CPU count/high ram count machines, I'll play with them on the next pass.

 

I was playing around with my other DL380G6 box, and discovered I can connect and use a SATA HDD connected to the cable/port that usually goes to the CD-ROM drive.  If I do that, I can then dedicate 16 disks to plotting.  I already have the extra 300GB disks here.  If I had 600GB disks I'd try to 2 plots/drive, but I don't, and i'm not spending any $ at this point. (more on that later)

I'm thinking if I stuff a 2TB SATA in there as a dump target for the plots, before they're moved to final storage, I should be able to get away with getting 16 plots in a pass.

 

Now, regarding any improvements involving $... I'm personally getting to the opinion of, us small time farmers are screwed.  The network is growing too fast, and the Time-to-Win is getting worse by the second... exponentially worse.  While I'm not abandoning all hope yet, there's no way I'm going to put the finances behind trying to make a heavy go of this.  I'm going to use this as platform to experiment, learn some things, get some ideas flushed out on some things I've thought about in the past (namely, using the Dell MD1000 box as a possible expansion for unRAID in the future, and the needed hardware to make that happen). but I'm not dumping 4 or 5 figures of cash into the hope I'll make any of it back.  Right now, i'm "out" $50 for a controller and a cable, and those again would make the MD1000 useful in the future anyways, so not really a "loss" in that respect.

I *might* pick up some 300GB or 600GB disks, just since I can use them for these servers original roles... backup hardware for my hosting server.  But the writing i'm seeing on the wall for Chia isn't good, for the little guy. Maybe Pools will change my opinion, but i'm not banking on it.

Edited by sota
Link to comment

Agreed, the people spending stupid amounts of money on plotting and farming parts are crazy at this point. I just don't understand spending $1500 on what was $250 last year. I am doing some upgrades (mostly backups and redundancy related) that I have had on my to do list for along time simply because I want to beat inflation.

 

None of the prices I have paid are what I would consider bad given the circumstances, just not as good as if I had purchased 6 months ago when I looked them up lol.

 

Good point on the 300gb limit, forgot about that. You could try raiding 2 or 3 drives together to run more plots but might not be worth it. You are CPU limited for sure with your setup, the only way to get more out of it is to use Linux and run more in parallel. You can try allocating more cores to the plot but only the first stage (everything before 39%) is multi-threaded. So it won't help with the rest.

 

Sounds like what you need is more drives / larger drives to run in parallel. Maybe a disk shelf / JBOD with those spare drives you said you had?

Link to comment

I thought about RAID0'ing a pair of the 300s, but that still wouldn't help much.  I doubt 2 plots in parallel on a single "drive" would be any faster really than 2 plots in parallel on individual drives.  I'm not seeing this thing as being I/O bound that much really, as the disks are never being shown as worked that hard.  That is unless someone has actual data showing that insane burst traffic is what the plotter wants to see.

 

As it stands now, and as I said, I'll probably get this thing setup to do 16 parallel plots on 16 individual drives (i'm guessing in a 24 hour period, as I don't see doubling the work load on this server will even make it care, given the metrics I'm seeing), possibly get a 2nd box configured the same, and I might even try the whole linux plotter idea on the 2nd box, just to learn something new.  At the very least I can then give an apples-to-apples comparison with regards to that, since there's notes that windows costs 5-10% for speed.

 

I'm actually kind of hoping that, in a couple weeks/months there will be a glut of relatively new but used 10/12/14/18/20TB disks on the market, that I can snatch up for under $10/TB cost. :D  Yea, I'm one of those people.

Link to comment

lol, yeah I would also like to see some cheap used hardware flood the market in a few months. I picked up so many cheap GPU's after the last crash I outfitted all my family and friends.

 

FYI, plotting on linux vs windows is virtually identical, it uses the same GUI program for both (unless you are using an external plot manager, although those are also cross platform for the most part).

 

Just install ubuntu onto a spare drive, install Chia by using the .deb file on the chia website and you are off to the races pretty much. I would recommend install the btrfs package so you can use btrfs formatted drives, it is a simple terminal command to install it and then you can use the GUI to format drives.

 

Only other change I made was increasing my dirty_ratio to 90% since I have spare memory. There are write ups for doing that but I could show you what I did if you needed.

Link to comment

There is one thing I just read some info about:

the -2 second/alternate temp folder, that might be an idea to implement, as it's used during Phase 3/compress to file.

ex: Time for phase 3 = 28166.616 seconds. CPU (92.740%) Sun May 23 13:00:47 2021

wonder what that would look like if -2 was another disk.

guess I'll run a quartet of plots with different/specified -2 targets and see what happens.

some data might shed light on, if staggering plot start times, with a common -2 target, would net an overall speed increase  worth the complexity involved.

 

some other data from the same log:

Total time = 85716.351 seconds. CPU (122.120%) Sun May 23 13:33:43 2021
Copy time = 917.057 seconds. CPU (18.120%) Sun May 23 13:49:00 2021
 

So if I could make a substantial dent in Phase 3 time, that might be worth it, since it accounted for 32% of the total processing time.

Edited by sota
Link to comment

Some more data, from my 8 parallel:

these were all done with: plots create -k 32 -b 4096 -u 128 -r 2 -t C:\temp\$num -d C:\Harvests -n 1

 

---------- C:\TEMP\1\PLOT1.LOG
Time for phase 1 = 44720.749 seconds. CPU (148.630%) Sun May 23 02:10:27 2021
Time for phase 2 = 10853.645 seconds. CPU (95.530%) Sun May 23 05:11:21 2021
Time for phase 3 = 28166.616 seconds. CPU (92.740%) Sun May 23 13:00:47 2021
Time for phase 4 = 1975.339 seconds. CPU (86.720%) Sun May 23 13:33:43 2021

---------- C:\TEMP\2\PLOT1.LOG
Time for phase 1 = 45075.196 seconds. CPU (148.010%) Sun May 23 02:16:35 2021
Time for phase 2 = 10780.690 seconds. CPU (95.900%) Sun May 23 05:16:15 2021
Time for phase 3 = 27969.140 seconds. CPU (93.110%) Sun May 23 13:02:25 2021
Time for phase 4 = 2038.196 seconds. CPU (86.780%) Sun May 23 13:36:23 2021

---------- C:\TEMP\3\PLOT1.LOG
Time for phase 1 = 42908.333 seconds. CPU (149.070%) Sun May 23 01:40:37 2021
Time for phase 2 = 9027.629 seconds. CPU (95.950%) Sun May 23 04:11:04 2021
Time for phase 3 = 24550.606 seconds. CPU (91.330%) Sun May 23 11:00:15 2021
Time for phase 4 = 1897.660 seconds. CPU (84.620%) Sun May 23 11:31:52 2021

---------- C:\TEMP\5\PLOT1.LOG
Time for phase 1 = 44045.933 seconds. CPU (149.110%) Sun May 23 01:59:50 2021
Time for phase 2 = 10892.142 seconds. CPU (95.000%) Sun May 23 05:01:22 2021
Time for phase 3 = 28165.784 seconds. CPU (92.690%) Sun May 23 12:50:48 2021
Time for phase 4 = 1893.469 seconds. CPU (87.270%) Sun May 23 13:22:21 2021

---------- C:\TEMP\6\PLOT1.LOG
Time for phase 1 = 42901.266 seconds. CPU (149.300%) Sun May 23 01:40:53 2021
Time for phase 2 = 8992.552 seconds. CPU (96.250%) Sun May 23 04:10:45 2021
Time for phase 3 = 24199.397 seconds. CPU (92.320%) Sun May 23 10:54:05 2021
Time for phase 4 = 1869.429 seconds. CPU (87.010%) Sun May 23 11:25:14 2021

---------- C:\TEMP\7\PLOT1.LOG
Time for phase 1 = 44961.929 seconds. CPU (148.270%) Sun May 23 02:16:20 2021
Time for phase 2 = 10770.574 seconds. CPU (95.660%) Sun May 23 05:15:51 2021
Time for phase 3 = 28123.101 seconds. CPU (92.700%) Sun May 23 13:04:34 2021
Time for phase 4 = 2016.929 seconds. CPU (85.950%) Sun May 23 13:38:11 2021

---------- C:\TEMP\8\PLOT1.LOG
Time for phase 1 = 44217.178 seconds. CPU (149.020%) Sun May 23 02:04:06 2021
Time for phase 2 = 10451.128 seconds. CPU (95.020%) Sun May 23 04:58:17 2021
Time for phase 3 = 28136.283 seconds. CPU (92.860%) Sun May 23 12:47:13 2021
Time for phase 4 = 1903.936 seconds. CPU (87.310%) Sun May 23 13:18:57 2021
 

given this is an 8/16 core system, not sure there's much more I can do for phase 1.

i'm running a 4 plot parallel run, no stagger, right now: plots create -k 32 -b 4096 -u 128 -r 4 -t C:\temp\$num -2 C:\temp\$num4 -d C:\Harvests -n 1

 

yea I know, I changed 2 things (-r and -2), so let's see.

Edited by sota
Link to comment

interested in what the 2nd temp dir does, I saw that option but never looked into what it did yet.

 

What does your CPU usage look like during plotting? You should be able to overprovision the CPU unless it was already hitting 100%. Won't help a ton but might let you squeeze out a little more performance.

 

I also have 16c/32t and I generally only see %50 usage on phase one with 6 cores per plot. There is still a single thread limiting factor for sure.

Link to comment

funny you should ask...

 

---------- PLOT1.LOG
F1 complete, time: 757.709 seconds. CPU (72.7%) Sat May 22 13:57:44 2021
Forward propagation table time: 6293.351 seconds. CPU (152.920%) Sat May 22 15:42:38 2021
Forward propagation table time: 7105.469 seconds. CPU (148.590%) Sat May 22 17:41:03 2021
Forward propagation table time: 8284.036 seconds. CPU (148.720%) Sat May 22 19:59:07 2021
 

---------- PLOT.LOG
F1 complete, time: 464.498 seconds. CPU (180.27%) Sun May 23 16:14:50 2021
Forward propagation table time: 3739.719 seconds. CPU (244.620%) Sun May 23 17:17:10 2021
Forward propagation table time: 4467.289 seconds. CPU (215.370%) Sun May 23 18:31:37 2021
Forward propagation table time: 5242.561 seconds. CPU (213.940%) Sun May 23 19:59:00 2021
 

some log snippets.

 

PLOT1 was part of the 8-plot parallel group, 2 thread each

PLOT is part of the 4-plot parallel group running now, 4 thread each

 

not *quite* apples to apples, but it presents some interesting questions.

I'm going to let this 4-plot complete, to gather all the data about this run, then consider doing an 8-plot with 4 threads instead of 2.

 

One of my considerations on stuff is, I don't have much automation/scripting done yet, to handle kicking off new plots and moving the resultant files automatically.  I can obviously develop that, but it's a question of if I want to.

Link to comment

Yeah, I noticed that using more threads seems to help as well. It was reported that too many threads can reduce performance but that was like 12 threads for a plot. I have been using 6 threads so far although CPU usage doesn't seem much higher then 4 threads.

 

There are several plot managers out now that wil handle the automation side of things for you. Plotman and swar are the 2 most popular I have seen. Might look into those.

Link to comment

So an interesting result.

1st of the quartet ended a little while ago.

Starting phase 1/4: Forward Propagation into tmp files... Sun May 23 16:09:13 2021
Total time = 54933.449 seconds. CPU (149.910%) Mon May 24 07:24:47 2021
elapsed: 15:15:34

 

added 2 trimmed log files so you can see the time differentals.

 

 

0.txt 1.txt

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.