Chia farming, plotting; array and unassigned devices


Shunz

Recommended Posts

Posted (edited)

getting some interesting preliminary results from this latest run (16p1t).

phase 2 DEFINITELY suffers on only 1 thread.  total estimated time to complete is up about 45%, however the average plot time is DOWN 16%.

I also need figure out how to have a final plot staging drive on this machine, as having to ship the plot over the network is going to have a statistically significant impact in overall plot speed, since I can't start a new plot until completed one has left the drive, since they're only 300GB disks.  or... can I?  I'd run that test now, but i'm making changes to the plot parameters, as well as I want to reformat the disks to a different cluster size.

 

oh, and  P410 controller is slower than a P420.

Edited by sota
Link to comment

The plot managers out there (plotman being one of them), will automatically kick of the next plot when entering phase 4 IIRC (the coping stage) and can be tuned to whatever you want.

 

With careful tuning it is indeed possible to overprovision a drive/ram/CPU. Takes trial and error though. With a 300gb drive there is not much room for error that is for sure.

 

On linux I use netdata to log everything so I can go back and see the peak ram/cpu/disk usage and adjust accordingly. Makes it pretty easy, at least in theory once I get the new setup working. With staggered plots pretty sure I can have a fair amount more running then I would think. The issue will be iops of the drives at that point.

 

Just got the cable in for the jbod and testing that today, it is a sas1 backplane so limited to around ~1.2gb/s but gonna test that. I have a sas2 backplane and/or an expander I could use instead but don't really want to move those unless I have to.

 

Expanders went from $18 everywhere to $65 and hard to find in the last week!

Link to comment
Posted (edited)

look into the Machinaris docker on unraid / Linux. Think I settled on using it for long term plotting and has plotman built in. Also uses a webgui so you can manage multiple systems easily.

 

Also setup for harvesting/plotting (basically a much better method of using multiple computers like you).

 

I will run it on unriad for my main machine and then just use an ubuntu install on the rest with portainer and machinaris most likely.

 

Apparently it runs on windows as well.

Edited by TexasUnraid
Link to comment

interesting results, further shows that parallel is king.

 

Wonder what would happen if you over-provisioned the CPU with 16p2t or even 4t.

 

If you staggered the plots you could do 4-6t pretty easily I am guessing since the later phases would not be competing the threads.

Link to comment

That's part of what i'm planning on doing next.

reformat all plotter drives to 32k sectors. (worth potentially a couple % off total time)

change to 64 buckets (could be worth another 8%?)

6k ram per plot (no idea what that improved, % wise.)

stagger start plots in groups of 4 (undecided on time delay at this point though.)

 

Think I found a disk I can use for an intermediate: 2TB WD Scorpio Blue 2.5", since i've figured out how to use the CD-ROM header in the server as standard HDD location.

 

Link to comment

What do the buckets do, I keep meaning to figure out that setting.

 

I am trying to figure out what my backplane in the jbod seems to be limited to 600mb/s when I was expecting 1.2gb/s.

 

It is the supermicro sas1 backplane. I got it for free and figured I might be able to put it to use.

 

Edit: Ok, really strange, I swapped from 8x SSD's to some random drives and now getting more total bandwidth? Up around 800mb/s but think that is the limit of the drives.

 

Very odd. Guess I will have to wait for the sas drives to show up to really test it.

Link to comment

That's part of what i'm planning on doing next.

reformat all plotter drives to 32k sectors. (worth potentially a couple % off total time)

change to 64 buckets (could be worth another 8%?)

6k ram per plot (no idea what that improved, % wise.)

stagger start plots in groups of 4 (undecided on time delay at this point though.)

 

Think I found a disk I can use for an intermediate: 2TB WD Scorpio Blue 2.5", since i've figured out how to use the CD-ROM header in the server as standard HDD location.

 

And as for how buckets and blocks (memory) are related, this is just from some recent observations and deductions; take them with a grain of salt:

buckets are into how many pieces each phase dataset is broken into.  you ideally want your blocks (memory) to be big enough to complete the sort function entirely in memory (uniform sort).  if it's not, you wind up dropping to the slower (?) quick sort method.

so, if you have more memory, you can have fewer buckets, and it should be able to more quickly process the entire dataset.

i'm attempting to tune the buckets and blocks for each machine and type that I have here.

Link to comment
16 minutes ago, sota said:

And as for how buckets and blocks (memory) are related, this is just from some recent observations and deductions; take them with a grain of salt:

buckets are into how many pieces each phase dataset is broken into.  you ideally want your blocks (memory) to be big enough to complete the sort function entirely in memory (uniform sort).  if it's not, you wind up dropping to the slower (?) quick sort method.

so, if you have more memory, you can have fewer buckets, and it should be able to more quickly process the entire dataset.

i'm attempting to tune the buckets and blocks for each machine and type that I have here.

 

Very interesting, I will have to try cutting the buckets down as well since I have plenty of extra memory.

Link to comment

if you cut the buckets in half to 64, I saw various process lines saying the u-sort needed 6.5GB ram to work, and since it didn't have it was dropping to q-sort.  I'm setting my bucket size to 6656 for the next round.

Link to comment

I'm thinking of using the UNRAID array without parity for harvesting storage until I hit the 33 drive limit. I wish UNRAID already supported having multiple UNRAID Arrays, as they only alternatives after 33 drives is to use Unassigned Devices or a BTRFS based pool with parity.

 

To make using Unassigned Devices more palatable I'm considering using mergerfs. The script made by @testdasi, which is based on the work done here, is a good starting point for those interested in using mergerfs to make all disks disks mounted using Unassigned Devices accessbile over one share/mount point.

 

 

Link to comment

I'm sure the powers that be will chime in, but given the intended use of unRAID, the drive count limitation made sense.  Chia is a New Thing(tm) that I'd be surprised if the unRAID dev team even saw coming, so the need for so many drives in a non-corporate environment is unexpected.  I'd be willing to bet their technical background issues with having so many drives, and potentially wanting to have parity as well.  Now, if unRAID wanted to offer the ability to go (virtually) unlimited drives with NO parity options, that could be interesting.  Even if they put it out as a total Alpha build (it breaks, don't come crying to us), that could be viable for Chia and other future space-time coin projects.

 

Just the thoughts of a filthy casual. :D

 

Link to comment

So the Big Box plotter is now on the ghetto farm.

16 300GB SAS DP 15k drives, set up as individual disks.

2TB SATA WD Scorpio Blue as boot and harvest storage.

1 plot started every 10 minutes.

post-started all plots delay of 50,400. (to be edited as time goes on)

mover checks for new plots in harvest, and moves them to the farm.

 

got the main script for all the other plotters dialed in I think. tuned bucket, block, thread counts, and temp storage for each.

 

And I lost one of my smaller plotters.  looks like a memory chip in it went tits up, or so it seems.  still testing to see if I can bring it back online.

 

Link to comment
8 hours ago, ryanhaver said:

I'm thinking of using the UNRAID array without parity for harvesting storage until I hit the 33 drive limit. I wish UNRAID already supported having multiple UNRAID Arrays, as they only alternatives after 33 drives is to use Unassigned Devices or a BTRFS based pool with parity.

 

To make using Unassigned Devices more palatable I'm considering using mergerfs. The script made by @testdasi, which is based on the work done here, is a good starting point for those interested in using mergerfs to make all disks disks mounted using Unassigned Devices accessbile over one share/mount point.

 

 

I am still debating how I am going to do it but most likely will use pools vs the array. The array is nice for files for one reason and one reason only IMHO. The ability to loose a drive and only loose what was on that drive.

 

If this is a dedicated unraid license for chia, then the array could work. Particularly without a parity drive. Although after that you would have to move to pools anyways, so might as well start out there IMHO.

 

If you will be having a bunch of drives, you can run in raid 5 so you have some redundancy although it wastes some space. You could do raid0 but if you loose a drive you loose everything.

 

Alternatively you can use unassigned devices and just setup the drives individually. I am debating this method or pools. Most likely used a raid5 pool for all my small drives as the wasted space will not be that much and some redundancy would be nice as I don't trust those. Also there would be a lot of wasted space without raid anyways.

 

For the larger drives still debating.

Link to comment
19 minutes ago, sota said:

Um... I know it's only been running for about 30 minutes, but if the stats I just pulled for F1 are any indications of what to go by, holy hades!

Interesting, so you are running 16p32t? or 32p64t?

 

Are you maxing out cpu/ram/hdd at this point or still got more headroom?

Link to comment
Posted (edited)

pretty sure i'm maxing out CPU right now.

16 plots, 2 threads each.

 

here's some raw data:

F1 complete, time: 197.843 seconds. CPU (187.92%) Sun May 30 08:49:38 2021
F1 complete, time: 230.411 seconds. CPU (184.9%) Sun May 30 09:00:04 2021
F1 complete, time: 204.651 seconds. CPU (185.32%) Sun May 30 09:09:39 2021
F1 complete, time: 228.132 seconds. CPU (186.65%) Sun May 30 09:20:02 2021
F1 complete, time: 257.852 seconds. CPU (185.06%) Sun May 30 09:30:32 2021
F1 complete, time: 281.433 seconds. CPU (184.08%) Sun May 30 09:40:56 2021
F1 complete, time: 281.054 seconds. CPU (183.65%) Sun May 30 09:42:39 2021
F1 complete, time: 289.449 seconds. CPU (179.14%) Sun May 30 09:52:47 2021
F1 complete, time: 304.029 seconds. CPU (173.73%) Sun May 30 10:03:02 2021
F1 complete, time: 350.178 seconds. CPU (150.61%) Sun May 30 10:12:21 2021
F1 complete, time: 345.094 seconds. CPU (155.06%) Sun May 30 10:12:17 2021
F1 complete, time: 389.805 seconds. CPU (133.09%) Sun May 30 10:20:39 2021
F1 complete, time: 449.83 seconds. CPU (115.89%) Sun May 30 10:31:43 2021
F1 complete, time: 403.385 seconds. CPU (125.08%) Sun May 30 10:41:05 2021
F1 complete, time: 588.592 seconds. CPU (88.76%) Sun May 30 10:54:15 2021
F1 complete, time: 512.601 seconds. CPU (100.09%) Sun May 30 11:03:06 2021

you can see the times degraded as more plots were brought online.  the start times got a little weird in the middle, as I accidentally killed the window that was managing the timing. what's interesting though the time delta between each ending plot, as if you "filter" the noise in your head you can see they're pretty darn consistent.

 

image.png.38b51dc2b1068c5ca17bd65e72e8bfd2.png

 

average was still 332.146, which is -410.03 from the baseline 8p2t blob storage run.

 

Curious to see what might be going on, I dumped the Phase 1 start times:

Starting phase 1/4: Forward Propagation into tmp files... Sun May 30 08:46:21 2021
Starting phase 1/4: Forward Propagation into tmp files... Sun May 30 08:56:14 2021
Starting phase 1/4: Forward Propagation into tmp files... Sun May 30 09:06:14 2021
Starting phase 1/4: Forward Propagation into tmp files... Sun May 30 09:16:14 2021
Starting phase 1/4: Forward Propagation into tmp files... Sun May 30 09:26:14 2021
Starting phase 1/4: Forward Propagation into tmp files... Sun May 30 09:36:15 2021
Starting phase 1/4: Forward Propagation into tmp files... Sun May 30 09:37:58 2021
Starting phase 1/4: Forward Propagation into tmp files... Sun May 30 09:47:58 2021
Starting phase 1/4: Forward Propagation into tmp files... Sun May 30 09:57:58 2021
Starting phase 1/4: Forward Propagation into tmp files... Sun May 30 10:06:30 2021
Starting phase 1/4: Forward Propagation into tmp files... Sun May 30 10:06:32 2021
Starting phase 1/4: Forward Propagation into tmp files... Sun May 30 10:14:09 2021
Starting phase 1/4: Forward Propagation into tmp files... Sun May 30 10:24:13 2021
Starting phase 1/4: Forward Propagation into tmp files... Sun May 30 10:34:21 2021
Starting phase 1/4: Forward Propagation into tmp files... Sun May 30 10:44:26 2021
Starting phase 1/4: Forward Propagation into tmp files... Sun May 30 10:54:33 2021

 

Doing a little excel wizardy I get this:

start		end		delta
8:46:21 AM	8:49:38		0:03:17
8:56:14 AM	9:00:04		0:03:50
9:06:14 AM	9:09:39		0:03:25
9:16:14 AM	9:20:02		0:03:48
9:26:14 AM	9:30:32		0:04:18
9:36:15 AM	9:40:56		0:04:41
9:37:58 AM	9:42:39		0:04:41
9:47:58 AM	9:52:47		0:04:49
9:57:58 AM	10:03:02	0:05:04
10:06:30 AM	10:12:21	0:05:51
10:06:32 AM	10:12:17	0:05:45
10:14:09 AM	10:20:39	0:06:30
10:24:13 AM	10:31:43	0:07:30
10:34:21 AM	10:41:05	0:06:44
10:44:26 AM	10:54:15	0:09:49
10:54:33 AM	11:03:06	0:08:33

So the run time for F1 matches what's being reported by the system, but there doesn't seem to be a major effect on the end time of the F1 process, even with nearly all plots deep into Table 2 bucket sorts.

 

Still waiting on Phase 1 times to see what happens, which I estimate to be as early as 6:30pm today.  Probably a lot later than that though :D

 

 

 

Edited by sota
Link to comment

looks like I've also resurrected Chia9 (memory tests came up clean, and now it's booting fine) but Chia6 just decided to throw the infamous HP 5 beeps/red flashes, which also indicates a memory problem.

starting to think these old boxes don't want to do the work. :D 

Link to comment

Interesting results on phase 1, not sure what to make of it yet. Too bad you can't log the CPU/memory usage in tandem with the plot times, could give a good window into what is going on.

 

I do not envy working on old gear, the oldest I will mess with now days is sandy bridge, older then that is just not worth the time and hassle for me anymore. That said most of my non-gaming computers are sand/ivy bridge.

 

I will take any of these over what we had to deal with back in the 90's. You can almost forget the nightmares of trying to find drivers, install them and then troubleshoot why they are not working.

 

Guys now days take for granted just being able to plug something in and have it work at least to some degree.

 

Think I finally got my back plane issues sorted out, I have no idea what happened or what fixed it though, I hate that. Waiting for chia to finish syncing and then going to reboot and get back to testing.

Link to comment

I just learned I cannot power up the MD1000 at this point; it pisses off the UPS everything is connected to. :D

might have to "tweak" how and where things are plugged into, it looks like.

Link to comment

some unconfirmed noise about seagate making "chia" oriented drives.  possibly some really big, slow SMR disks.

honestly, SMR is a PERFECT marriage for Chia, and any future coin like it.  maybe if they did and it's priced right it'll take pressure off the rest of the drive supply.

Link to comment
3 minutes ago, sota said:

some unconfirmed noise about seagate making "chia" oriented drives.  possibly some really big, slow SMR disks.

honestly, SMR is a PERFECT marriage for Chia, and any future coin like it.  maybe if they did and it's priced right it'll take pressure off the rest of the drive supply.

 

I can totally see them doing that, my concern is that they will then start putting those in the external dives due to being cheaper and screw us shuckers.

 

At least right now we know that over 8TB is going to be a CMR drive.

 

It would be nice to think something would take pressure off the supply chain but trying to buy a GPU for the last 6 months has made me realize that is not happening anytime soon. Just too many mouths to feed and not enough to go around.

 

Kinda like the stunt nvida is pulling with the mining limited GPU's, it won't make a difference in the big picture but does reduce actual gamers ability to mine in their spare time (like me on my old vega 56).

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.