whitewlf

Members
  • Posts

    12
  • Joined

  • Last visited

Everything posted by whitewlf

  1. Not sure if this is what you are stumbling against, but, the script that this runs on does some voodoo when it starts. After you see it, you can understand it's usefulness and how to work with it. You just cannot work -against- it. Took a little snooping to figure it out on my own, but, it is probably on the git page as to how it works. (From my understanding) in order to keep a central point of storage between multiple installations/types of stable diffusion, it creates symlinks internal to the docker to point to each folder for common files, like models and output (and loras, vaes, embeddings, etc) I included a trimmed snippet below of the log output showing what gets associated to where in the 02 A111 installation which I am using: moving folder /config/02-sd-webui/webui/models/Stable-diffusion to /config/models/stable-diffusion removing folder /config/02-sd-webui/webui/models/Stable-diffusion and create symlink moving folder /config/02-sd-webui/webui/models/hypernetworks to /config/models/hypernetwork removing folder /config/02-sd-webui/webui/models/hypernetworks and create symlink moving folder /config/02-sd-webui/webui/models/Lora to /config/models/lora removing folder /config/02-sd-webui/webui/models/Lora and create symlink moving folder /config/02-sd-webui/webui/models/VAE to /config/models/vae removing folder /config/02-sd-webui/webui/models/VAE and create symlink moving folder /config/02-sd-webui/webui/embeddings to /config/models/embeddings removing folder /config/02-sd-webui/webui/embeddings and create symlink moving folder /config/02-sd-webui/webui/models/ESRGAN to /config/models/upscale removing folder /config/02-sd-webui/webui/models/ESRGAN and create symlink moving folder /config/02-sd-webui/webui/models/BLIP to /config/models/blip removing folder /config/02-sd-webui/webui/models/BLIP and create symlink moving folder /config/02-sd-webui/webui/models/Codeformer to /config/models/codeformer removing folder /config/02-sd-webui/webui/models/Codeformer and create symlink moving folder /config/02-sd-webui/webui/models/GFPGAN to /config/models/gfpgan removing folder /config/02-sd-webui/webui/models/GFPGAN and create symlink moving folder /config/02-sd-webui/webui/models/LDSR to /config/models/ldsr removing folder /config/02-sd-webui/webui/models/LDSR and create symlink moving folder /config/02-sd-webui/webui/models/ControlNet to /config/models/controlnet removing folder /config/02-sd-webui/webui/models/ControlNet and create symlink moving folder /config/02-sd-webui/webui/outputs to /config/outputs/02-sd-webui removing folder /config/02-sd-webui/webui/outputs and create symlink Run Stable-Diffusion-WebUI Also, I cannot remember if I added this, or, it is default, but, I docker forwarded the output to an array share, and the full models point to another array share point. Initially I thought it would be nice to keep checkpoints/etc on my cache-nvme but, I've been hoarding again and now have a full TB of models/loras/etc. Plus, I don't swap checkpoints often, I simply run dozens of prompts simultaneous like a lunatic. I do a lot of seed shopping, so the checkpoints just sit in the card, and loras are small, so there is no reason to burn the faster storage, for me. Plus, the share uses the array cache anyway.
  2. I've had the looping happen a couple times now, hard to say what happened to start it. The first time, I thought it was after an unraid docker update, but, just a bit ago it happened while I was making images and tried to change the checkpoint... it simply stopped and gave me odd errors. I lost the log, but, it was about being unable to access date-named files corresponding to a change in ckpt in this folder "..appdata2/stable-diffusion/02-sd-webui/webui/config_states/" I tired restarting, and it just looped... taking forever to do so. Previously I had added/uncommented this to the parameters.txt file: --reinstall-xformers --reinstall-torch I then re-commented them after, so, I can only assume this is why it started looping after I tried restarting the container after the errors. And this alone fixed the looping, no folders deleted, etc, and no configs/extensions lost. Not sure this helps all, but, it seems to have worked for me twice now. Hope it fixes the issue at least for some. I do have a question: How did you get Forge installed on this docker? I haven't tried it yet on my desktop, but, all the word is it is much better for speed, etc, for the smaller cards. I'm getting about 1-5it/s for 512-1024 ranged images on current RTX2000-12G Note: I run unraid in a 2U E5-20core Xeon supermicro chassis, I recently got an RTX 2000 12GB unit off ebay, and it has been fantastic. It only burns about 60watts while churning images, and works well for plex transcode at the same time. While running SD on my desktop, RTX2080 8G, it pretty much eats my whole machine, can't play games, etc while its running, and, cant 'pause' the queue to do so. 400$ for a server GPU that sips power while munching images is a good deal. The 12G vs. 8G alone is a huge thing. Sadly no 24GB low pro is likely to ever appear, but, a 4U chassis and new server may happen at some point.
  3. Q: Could a 4port SFP+ PCI card be used to provide connections (eg. SMB (or iscsi?) Workstation <-> Unraid Array) effectively, or, would it be better on an external switch? I have been pricing out switches to add to my lab/lan and the pricing is of course a major issue. Partly due to one 'requirement'; I want to add, which is 10gb networking to PC workstations. This is just for home/homelab, so going to the best mix of efficiency and 'lab' use for NetAdmin experiments and such. I happen to have access to some decent/limited gear for free, so, looking at the viability. Server is a 2U xeon SuperMicro, 12bay. Came with a 4 port SFP NIC and nice SAS Raid controller (in HBM sata split mode). Storage/use case is both common media server on unraid, some gameservers, but, also content creation uploads from workstations (eg. video streams for interim/stock/edit storage) So, lite archive and general storage of 2k/4k video for youtube content creation. Moving a 1-4g file over 1gb LAN is about 110mb/sec currently (sustained... ram and nvme cache work really well on unraid usually.) 10G should allow editing directly over net, though that isn't the primary idea.. unless it's really that good. We originally wired up dual cat6a copper runs to each workstation, but now looking at adding a fiber to them (only 3 runs). Original intent was 10g copper, but, fiber seems doable/better... and, cheaper, since we have the 10g fiber SFPs already... and 10G copper SFPs are just very expensive/hot/etc. A big managed unit with 4xSFP+ is a hard ask, the price gets very high, especially if needing some poe and other gig ports. One alternative is a smaller switch like the 4xsfp+ Microtik CRS305, and then whatever 12-16port gig switch with 8 or more PoE ports. (Couple waps, some cameras, then various 1g copper devices. Bonded 1g for WAN also likely to reach 2g+ later). So, the question is, if we saved a bit of $$ on the 10G switch and just used the 4xSFP+ directly into the unraid server, would that be significantly different than a separate switch. Almost all traffic is workstation <-> Unraid... not between workstations. Internet WAN is 1.2gbps Xfinity... which could be 2g soon enough, package deal depending. So, having the internet via the 10G link would be a bit nicer, vs. inet via 1g copper alongside the unraid 10G direct. Bit unsure how to bridge up inet over the 10G if using the direct to card method as well. (4x1G on sever as well of course).
  4. I recently got a nearly new, corporate surplus supermicro server decked out with some lovely hardware, which I am moving my Unraid onto. It came with an Adaptec ASR-81605z with battery backup, feeding the chassis 12xhotswap SATA units via SAS->Sata cables. Does anyone know if this card has any issues with unraid, aside from needing to turn it to HBA/single disks mode? eg. need a special firmware to expose the drives properly, etc? The motherboard base is a SuperMicro X10SRH-CLN4F (https://www.supermicro.com/en/products/motherboard/X10SRH-CLN4F)
  5. Quick update, everything does seem ok, no errors thus far. The array rebuilt, the new drives cleared and added in, and now it's syncing up the second parity, which will take quite a while. Since it needs to read 100% of all the drives, if there are any exposable problems, it should trip them. 27.8% done, 40TB usable, 17.7TB free.
  6. Letting it add these two new drive in first, will do more checks when it finishes, and run parity check. Best to add the second drive first? Otherwise it will likely be 3days+3days to do it twice. I am still fairly wary of Disk 2. If it has more than a small bit of error it's likely best to warranty out, it's only 9-10mo old. Thanks for all the help, I'll let you know how it ends up. or blows up. ?
  7. Already did the new config, "parity is valid" approach, then checked disk 2 & 3 btfs --repair, checked all disks short smart, stopped it and added two more drives. It's clearing them now overnight. Will add the second parity when/if that finishes. I won't know if the data is damaged really until I run across it. No errors reported yet. selene-diagnostics-20180611-2344.zip
  8. As this array is simply a personal only, cost effective mass storage solution I was choosing the least expensive drives which are readily, easily available. My local Fry's has them for 159 or less on sales. They do not seem to stock the WD in 8tb often. I am still hoping someone who knows more about the force enabling of the out of sync offline drive could advise on the above.
  9. On second glance, those drives preclearing have spun down, and stopped doing anything. They were writing at ~98-110MB/sec last night. The counter is still stuck at 39% progress. I am thinking something hung up.
  10. I could not see a direct way of doing so, to enable disk 3. The two new devices are still clearing, taking an awfully long time. been about 6-7 hours, only 40% done. I found a mention about enabling an out of parity disk for Unraid 6, stating to drop the parity disk from the array, add it back, and click "Accept parity".. would this be the process to use to re-enable Disk 3? Also, the pre-clearing, I am not sure what that process does. Should I wait for it to complete, or, just stop the preclear now, stop array, drop the parity, re-add it, etc? I am just trying to not make a mis-step that makes this more difficult or loses more data. On that note, anyone have an idea why Disk 2 would go from seemingly fine to spewing errors after only an unclean shut down? My only guess is, the 2 hour down time cooled the drive and exposed a flaw by the thermal fluctuation. It was the first time any of the drives had been shut down longer than a couple minutes since install, and even then only a couple times.
  11. I have/had a 5 drive system (4x8tb w/parity, 1xSsd cache). While it is on a UPS, it shares it with my PC and the PC had the USB line to it. (swapped/fixed that now). This morning we had a 2hour power outage, I am unsure if there was any flicker. All drives and both PC/unraid are on the UPS. When the unraid spun back up, it had about 1000 read errors on disk 2, and disk 3 was Disabled. I had -just- bought a new drive friday to add space to the array but hadn't put it in yet. I ran out and bought 2 more drives today, and all three are in the unit now, 2 pre-clearing, 1 waiting to be a second parity once that finishes. (Can't do it all at once). Disk 2 is now throwing tons of read errors, and took smb offline. It passed a btrfs and smart test before I added the new drives, and, until this morning's unclean shut, had not shown any errors at all. Disk 3 passed long and short smart tests, and btrfs. The data on it appears to be intact as well, but, due to what is likely a tiny parity mismatch, it's not in the array. With Disk 2 acting this squirrely, yet not yet kicked from the array, I am in doubt I will get too far with salvaging the data from it, but, it will also make Disk 3's emulated data broken. While I am not sure all the data on Disk 3 is 100%, I really don't care if a little is lost. I can likely replace it, but, replacing 7TB..+ 7TB.. (they were both nearly full) is entirely another pain in the ass. These are media files, most are huge, and, can survive a little damage, possibly. What is the best suggestion to proceed once the 2 new data drives enter the array. I'm holding off on the parity for now, plus, wondering if I should use that drive to direct copy the drive 3 data to if things go any more pear shaped. Is there any way to force Disk 3 to be back to active, accepting a bit of the parity might be bogus? This would allow me to remove Disk 2 and rebuild to the new drives, given that real errors>a likely touch of parity alignment. Note, all the data drives are Seagate expansion drives, identical models in and out, afaict, mag shingle design. They are currently all individual units, USB 3, on 2xPCIx-USB3-4 port low-pro cards (until a new server is built and drives shucked.) They are all under a year old. I plan on a second cache drive as soon as this data is fixed, as well. (2 parity, 5-6 data, 2 cache ssd.. depending on seagate's warranty for the erroring drive) I would have loved to do IronWolf's but, they are 2.5x the cost, and, I don't have a tower server ready for that atm. This server is (hilariously, and impressively) running on a low profile i5 ex-office pc. The USB connections are obviously less than ideal, but, have been running for nearly a year with only minor slowness (heavy use of sabnzb, plex, samba, parity, etc likely causes high IO load, large file deletes are slow, but, it's been adequate for simple plex/sab/sick/CP). On a side question, is there any way, in the future, to pre-empty an array data drive.. ie. have it spread it's contents to the other devices, before being removed/replaced for old age or if it starts tossing errors? (Versus the cold turkey, pull it and rebuild method.) I'm assuming this isn't a common method perhaps due to the similar amount of read/writes involved to the eventual rebuild anyway, but, I though I should ask. selene-diagnostics-20180611-0009.zip