Chia farming, plotting; array and unassigned devices


Shunz

Recommended Posts

after the absolute walloping WD got for their CMR/SMR shenanigans, I doubt any major manufacturer will obfuscate their drive's condition again any time soon. :D

plus I've read enough white papers to know, that 20TB CMR in a 3.5" form factor is about as far as they can go. Anything bigger is going to have to be SMR, without some major breakthrough that will get touted massively.

Link to comment

can also now say with absolute certainty, looking at the intermediate time stamps of the various stages, is utterly useless for attempting to gleen any meaningful estimates of completion time or performance.

I've got plots hitting same marker points, that were started 30 minutes later, but are ending 10+ minute sooner!

Link to comment

was hoping to get at least a rough idea when I can run the next metric extraction.  guess not.

 

oh... and did you know... you can copy the whole '.chia' between installs, to at least NOT have to rebuild from scratch if needed?  I figured that out recently.

Edited by sota
Link to comment
1 minute ago, sota said:

was hoping to get at least a rough idea when I can run the next metric extraction.  guess not.

 

oh... and did you know... you can copy the whole '.chia' between installs, to at least NOT have to rebuild from scratch if needed?  I figured that out recently.

 

I sort of figured it out, planning on trying that after it finishes syncing when setting up the 2nd server.

Link to comment

well the #s aren't looking good.

Despite F1 getting cut in half...

16 plots 2 threads each 10 minute stagger, is trending on par with the 16 plot 1 thread no stagger run, for phases 1 and 2; in fact it's actually worse. ironically, 16p1t netted the lowest average/plot time.

 

Also not helping that the divergence factor is all over the map.  Look at this phase 2 raw data:

Time for phase 2 = 13220.033 seconds. CPU (83.780%) Mon May 31 05:22:54 2021
Time for phase 2 = 13316.910 seconds. CPU (84.310%) Mon May 31 06:57:34 2021
Time for phase 2 = 14220.194 seconds. CPU (86.660%) Mon May 31 10:13:00 2021
Time for phase 2 = 14019.073 seconds. CPU (87.010%) Mon May 31 10:32:41 2021
Time for phase 2 = 14285.967 seconds. CPU (85.510%) Mon May 31 09:49:14 2021
Time for phase 2 = 14032.774 seconds. CPU (86.510%) Mon May 31 10:16:04 2021
Time for phase 2 = 13596.761 seconds. CPU (88.550%) Mon May 31 10:48:26 2021
Time for phase 2 = 13008.823 seconds. CPU (85.090%) Mon May 31 07:40:23 2021
Time for phase 2 = 13336.768 seconds. CPU (83.860%) Mon May 31 08:16:38 2021
Time for phase 2 = 13390.092 seconds. CPU (84.170%) Mon May 31 08:41:18 2021
Time for phase 2 = 13803.076 seconds. CPU (87.850%) Mon May 31 10:57:42 2021
Time for phase 2 = 13712.558 seconds. CPU (87.930%) Mon May 31 11:07:54 2021
Time for phase 2 = 13412.157 seconds. CPU (89.040%) Mon May 31 11:48:23 2021
Time for phase 2 = 13329.389 seconds. CPU (85.170%) Mon May 31 09:14:13 2021
Time for phase 2 = 13551.145 seconds. CPU (88.570%) Mon May 31 11:20:38 2021
Time for phase 2 = 13437.263 seconds. CPU (84.400%) Mon May 31 09:13:53 2021

 

Starting to think I'm going to be better off kicking off X loop processes and have them iterate each time their respective plot ends, and to hell with trying to "manage" the timing.

Link to comment

Most of the managing advice I have seen says to ignore time and instead focus on phases.

 

E.G. start another plot when current plot reaches phase 2, start another parallel plot when the first plot reaches phase 4 etc.

 

I did some testing with timing and 10 minutes is not nearly enough in my experience. I tried 2-3 hour splits (was trying to time it around when phase 1 would finish) and saw some good results at first but like you as more plots were added the times slowed down and they didn't actually leave phase 1 before starting the next plot.

 

Watching the drive usage, phase 1 and 2 seem to have very different drive usage patterns. phase one is a lot of steady state usage vs phase 2 is a lot of read/writing alternating.

 

Plotman handles all of this automatically, I will be using this docker to handle things:

 

Says it works on windows, might try setting that up, also handles multiple machines and setting up plotters vs harvesters etc.

Edited by TexasUnraid
Link to comment

well, I won't be finding out the results of this test.

had to move the power cord the UPS everything is plugged into, and for some reason the UPS puked when I pulled the mains.  but ONLY the DL380 barfed... all the PCs were unaffected.  FML.

Link to comment
11 hours ago, sota said:

for some reason the UPS puked when I pulled the mains.

If you disconnected the cord from the wall, that's normal for most UPS, they aren't designed to operate without a constant ground reference, especially if there is a path to ground through other parts of the equipment like network or other cables. Be glad the UPS shut down instead of dumping full power through your stuff and letting the magic smoke out, like mine did many years ago when I found this out the hard way.

Link to comment
7 minutes ago, jonathanm said:

If you disconnected the cord from the wall, that's normal for most UPS, they aren't designed to operate without a constant ground reference, especially if there is a path to ground through other parts of the equipment like network or other cables. Be glad the UPS shut down instead of dumping full power through your stuff and letting the magic smoke out, like mine did many years ago when I found this out the hard way.

 

Interesting, in all my years I never knew that or experienced that. I will admit I have unplugged a UPS countless times over the years for various reasons and never had an issue.

 

Course most of the UPS's I have worked with are consumer grade units with desktops.

Link to comment
39 minutes ago, TexasUnraid said:

 

Interesting, in all my years I never knew that or experienced that. I will admit I have unplugged a UPS countless times over the years for various reasons and never had an issue.

 

Course most of the UPS's I have worked with are consumer grade units with desktops.

It's like playing russian roulette. If the connected equipment is totally isolated or has another good path to ground, you generally are fine. If the only path to ground is through a monitor cable, or a network cable, it's possible you could have a bad day. Some UPS's handle it better than others, so it's very much YMMV.

Link to comment
1 minute ago, jonathanm said:

It's like playing russian roulette. If the connected equipment is totally isolated or has another good path to ground, you generally are fine. If the only path to ground is through a monitor cable, or a network cable, it's possible you could have a bad day. Some UPS's handle it better than others, so it's very much YMMV.

 

Makes sense, just surprised how many times I got lucky lol. I have had some pretty jank setups on UPS's over the years (part of why they were unpluged so many times lol).

Link to comment
2 hours ago, jonathanm said:

If you disconnected the cord from the wall, that's normal for most UPS, they aren't designed to operate without a constant ground reference, especially if there is a path to ground through other parts of the equipment like network or other cables. Be glad the UPS shut down instead of dumping full power through your stuff and letting the magic smoke out, like mine did many years ago when I found this out the hard way.

I've honestly never had a UPS ever puke like that, ever.  Now, this unit, and a twin I have in storage, were pulled out of service after over 20 years of use, because the battery gauge gets weird after the load goes over 30%, yet still would provide power to the load for the appropriate estimated time.  I'm taking this as proof that I should *permanently* retire both these old units. Oh, they're all APC SmartUPS units.

Edited by sota
Link to comment

info nugget: 64 buckets need blocks (memory) set to at least 6750, to keep sort in uniform, except for sorts that are forced to quick by the program.

 

6000s delay, 10 plots have not started, 6 however are showing 5 threads in resource monitor.

3 are in phase 3.

1 is in phase 2.

6 are in phase 1 (probably the 6 w/ 5 threads.)

note: I told the process to use 4 threads.  I'm guessing it's +1 for some kind of management?

CPU is showing on the chart at about 91% utilization.

 

thinking that 6000 needs to be bigger, or I need to put a delay in between the 2 "natural" groups of 8 for the drives (2 drive controllers, 8 disks on each, temp drives/folders are labeled 11-18 and 21-28)

a delay between groups could keep phase 1 down to 4 with any luck.  i'll look a the time delta between the start of #18 and the end of #11, and toss in a delay of at least that value.  Of course at that point, I could probably just start a new plot on #11, and forgo even plotting on 21-28.  I'll have to see how many plots/day this winds up with though, before I make that determination.

 

looking at the resmon, I can see that it's starting to pull a Scotty... system's overloading, captain! :D

think i'll kill the script for the rest of the 20-series plots, and let this settle out a bit.

Edited by sota
Link to comment

so basically it appears there is an inverse relationship between buckets and memory? Surprised I have not seen anyone else mention using lower buckets / more memory. Particularly with machines that are core or drive limited, seems like you would want to maximize memory usage.

 

I just got plotman synced up and setup on both machines. Basically in a holding pattern at this point, I had to move the 4tb drives into the main server, they are replacing a pool in unraid and using the old drives in the garage server but they are more storage drives, not plotting.

 

Waiting for the 10k sas drives to arrive but the seller might be trying to jerk me around.

 

They took the listing of 10 for $7 down a few hours after I purchased it and then re-listed the drives individually for $20 a pop.

 

Been 2 weeks now and it still has not shipped. I sent them a message and they gave me a line about being out of stock but I sent them back links of dozens or possibly hundreds of drives with the same part numbers currently listed on their store.

 

Waiting on a reply now.

 

Same store tried to screw me over by adding shipping cost to items out of the blue and if you order multiple items the total shipping cost was more then buying them in sets of 2. When sent a message multiple times I was just told "this is our shipping price now".

 

Sad, they used to be one of my go to ebay stores, generally had good deals if you dug through their listings.

 

If they want to raise the price, so be it, sucks I missed out, such is the case of an HBA from them I had my eye one for awhile that went from $90 with cables to $240 without cables.

 

This is a really scummy way of doing things though IMHO.

 

/rant.

Edited by TexasUnraid
Link to comment

Yea it's one thing to raise prices, but if i'm obligated to fulfill my end of the transaction (paying for it) they're obligated to sell it at the price agreed.  If it's less than they could have gotten it for, oh well. There was an agreement.  I'd be leaving them a NEGATIVE review ASAP, after giving them one final "send 'em" demand.

Link to comment

Yeah, I plan to do that, first negative I have left on ebay in like 10 years if it comes to that (had someone pull something similar with a welder that I won for less then he wanted to sell it for).

 

Hopefully they just send the drives, I know they have them and my planned setup is based around them.

 

I would also find it a lot harder to do chia if I was murdering SSD's in the process lol.

Edited by TexasUnraid
Link to comment

So I looked up something, and it turns out you can mark a disk or volume as read-only, in windows (hint: diskpart, select disk, attributes disk set readonly)

cool. so I tested it against one of my plot drives, then tried to create a new file on the disk.

 

image.png.02c1e098a112646c0979e018510c7c32.png

 

damn, son! :D

Edited by sota
Link to comment

so i'm not going to gleen and post the needed data, but the results of what i'm seeing, are telling me this thing can't be scheduled to do 16 plot lots efficiently.  it looks like it's 8 or less, even with staggered start.  and as for scheduling of plot runs, trying to keep timing clear is like herding cats... impossible.  So, once this thing finishes with the plots already processing, I'm setting up a 60 minute staggered start for 8 plots, and let each each one run in sequence.  they'll diverge and converge as they will.  Start 'em all and let GOD sort it out. :D

Link to comment
14 minutes ago, sota said:

so i'm not going to gleen and post the needed data, but the results of what i'm seeing, are telling me this thing can't be scheduled to do 16 plot lots efficiently.  it looks like it's 8 or less, even with staggered start.  and as for scheduling of plot runs, trying to keep timing clear is like herding cats... impossible.  So, once this thing finishes with the plots already processing, I'm setting up a 60 minute staggered start for 8 plots, and let each each one run in sequence.  they'll diverge and converge as they will.  Start 'em all and let GOD sort it out. :D

I found 1hr staggers are a good start point, I had good results from that.

Once I got it down to 30min, but I had to build in a job cap of around 12 to keep the system happy (its still a NAS and app server for everything else!)

Link to comment

on a good note, it appears my farm is working correctly...

 

2021-06-02T17:05:25.332 harvester chia.harvester.harvester: INFO     1 plots were eligible for farming a6ac75822c... Found 0 proofs. Time: 0.19568 s. Total 81 plots
2021-06-02T17:05:33.364 harvester chia.harvester.harvester: INFO     1 plots were eligible for farming a6ac75822c... Found 0 proofs. Time: 0.18119 s. Total 81 plots
2021-06-02T17:05:43.342 harvester chia.harvester.harvester: INFO     1 plots were eligible for farming a6ac75822c... Found 0 proofs. Time: 0.08190 s. Total 81 plots
2021-06-02T17:05:52.879 harvester chia.harvester.harvester: INFO     0 plots were eligible for farming a6ac75822c... Found 0 proofs. Time: 0.04482 s. Total 81 plots
2021-06-02T17:06:00.708 harvester chia.harvester.harvester: INFO     0 plots were eligible for farming a6ac75822c... Found 0 proofs. Time: 0.05575 s. Total 81 plots
2021-06-02T17:06:09.485 harvester chia.harvester.harvester: INFO     0 plots were eligible for farming a6ac75822c... Found 0 proofs. Time: 0.03305 s. Total 81 plots
2021-06-02T17:06:18.858 harvester chia.harvester.harvester: INFO     0 plots were eligible for farming 57c2a69fb1... Found 0 proofs. Time: 0.33405 s. Total 81 plots
2021-06-02T17:06:27.611 harvester chia.harvester.harvester: INFO     0 plots were eligible for farming 57c2a69fb1... Found 0 proofs. Time: 0.04072 s. Total 81 plots
2021-06-02T17:06:37.191 harvester chia.harvester.harvester: INFO     1 plots were eligible for farming 57c2a69fb1... Found 0 proofs. Time: 0.28098 s. Total 81 plots
2021-06-02T17:06:49.360 harvester chia.harvester.harvester: INFO     0 plots were eligible for farming 57c2a69fb1... Found 0 proofs. Time: 0.03999 s. Total 81 plots
2021-06-02T17:06:58.574 harvester chia.harvester.harvester: INFO     0 plots were eligible for farming 57c2a69fb1... Found 0 proofs. Time: 0.04503 s. Total 81 plots

 

image.thumb.png.49e9aabafde95d750824b77e94a46a66.png

 

haven't won anything yet, but it at least it looks like I could.

Link to comment

and I'm kind of 2 minds on pools.

since I'm about to split out 8 of the 16 drives from the one machine (since it can't effectively use them) and stick them in another identical machine, I may split my plotting between personal and pools.  i dunno yet.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.