• Posts

  • Joined

  • Last visited

Everything posted by BLKMGK

  1. Updated from 6.8.3 NVIDIA - went smoothly! I did remove some deprecated plug-ins and apps afterwards and had zero issues installing the new NVIDIA drivers - no changes to my Plex install for transcoding needed much to my relief. I'm on an AMD x570 mobo (Taichi) with a 3700 and with some fiddling in the temp settings app I now have temp readings for the first time! I am also seeing fan speeds for seven fans - very happy! Temps for my SSD and HDD continue to work. Not seeing power readings from my UPS but I can see my GPU readings finally so that's nice. I'll fiddle with the UPS when it's not 4am lol. All in all this came thru nice and smooth so far, my thanks for all those who tested and posted about their experiences as I wasn't able to test this time around.
  2. Well, sitting at over 6 days uptime now including a full scheduled parity check without errors. I'm not yet positive what's cleared this error and that sux. Temps have been cooler and I've had the house open but I've also had a second known good UPS inline with the server. The only major thing that changed between months of uptime and barely days was a new dedicated 30amp electrical circuit, a new 2U UPS, and warmer temps coincided. I guess I'll be patient but for now it's reliable - pretty frustrating. I guess I'll also mark the previously swapped PSU good and feel okay about keeping it with my spare chassis - it would've been nice had that been it as that's easiest to swap on a rackmount. I supose I could put it in the chassis attached to the known good UPS and see if the server toggles between them but I'm not sure how I'd get the info. Anyway, I'll post anything new here and sure hopew like heck it stays up and no one else goes through this grief.
  3. Not the power supply. Ran a memtest for over an hour, no issues. Have put a spare UPS in line between the recently ionstalled LARGE UPS and the server to try and see if the issue is coming from the new rackmount UPS - that I just had to run a new circuit to install. If it's the UPS I'll be upset but happy to have found this. If this doesn't work then I'll be booting it in "safe mode" to see if that helps <sigh> and probably a day's worth of memtest!
  4. Well, that was short lived - based on text messages I got from friends it died at 1:30 so it was up maybe 5 hours. I have swapped in a new PSU and touched nothing else - it's begun a parity check. If it goes down this time I may try moving it off my uber expensive UPS and onto a small portable one I used previously just to rule tat sucker out. The one it's on now has a dedicated circuit, brand new batteries, but was put in use not too far before these problems cropped up. I'll be pretty upset if that thing is the issue!
  5. I'm aware of the hazards of overclocking, I don't think I've owned a computer that wasn't overclocked in some way in the last 30 or so years if not longer. This thing would be water cooled if I could fit it without cutting. The settings now are close to stock and in Eco mode should be keeping it from boosting much and using less power. Underclocking is often done to save power but I've got enough containers running I'd prefer to not go that route unless pushed. SuperMicro chassis out of the box are quite loud. SuperMicro makes an SQ model PSU however that's dead silent and will only run it's fan when necessary, this cuts easily 50% of the noise. The rear fans are the next loudest but with a little tweaking fans that are normally used in the middle of the chassis can be used and that's what I've done. I did at one time try multiple third party fans that everyone claimed worked - they didn't lol. So yeah, this is cooled with good fans and not dead silent But WAY quieter than stock for sure. The CPU cooler is the AMD unit, 80mm vertical coolers work in this chassis but not 120mm. The OEM cooler seems to be handling the job fine - and has pretty colors when I pop the top 😛 Update: For the first time in a month, as far as I've seen, Parity Check completed successfully - whew! I feel a little more comfortable maybe swapping a drive, no additional SMART errors. Also no closer to solving the sudden reboot but I think I can rule out the parity process - that's GOOD news at least. If it goes down again that PSU is coming out for sure. Fingers crossed that's not soon and I can go back to my normal 6month long uptime. Wish I had better resolution but I'll keep watching it. Update 2: Well, that didn't last long - failed sometime after 5am last night. No time to swap PSU as I'm on the way out the door but I guess I'll be doing that tonight. 64 in the house last night so certainly not a cooling issue sheesh. Logs show nothing at all.
  6. This is a rack mount server chassis, it can actually accommodate 2x PSU in a failover setup but I've only got one installed currently. I have only briefly pondered the PSU as it's pretty good quality and made for this kind of (ab)use. The chassis holds 24 spinners and can accommodate multiple SSD too. I do actually have a spare, more than one if I'm willing to suffer turbine whine come to think of it lol. If it goes down again (it's at 40% thru the check now) I may do this first thing as swapping it is one of the easiest things I can do actually! I'll be shocked if that's the problem but as easily as it's replaced and with a spare that seems a good first step. Oh and yes I cleaned the heatsink fins. This system has only been together just short of a year but since it seemed like a heat issue that was one of the first changes. Slowing the CPU and lowering voltages got temps down quite a bit (over 20C) so I no longer think this is heat I also just realized I need to setup a syslog server somewhere to capture logs off box, I hate that we lose them when a system goes down. I might try building a script to dump them onto disk storage. I just came home and had this machine sleeping which broke the SSH connection ugh. This would be WAY easier if the syslog gave me clues. I look forward to the next release with its kernel having better support for my hardware and it's sensors!
  7. About three weeks ago my server dropped offline and I found it sitting at the boot main screen awaiting my crypto key input. I have a pretty solid UPS and my hardware is pretty new so this was puzzling, I was out of the town at the time. I thought that it might have gotten hot but logs showed no error (tailed in an SSH session). I brought the system up and it began a parity check. Last position I saw about a day later was some 90% complete - it dropped again. It did this one more time and then I was home. Each time it seems to get close to complete with parity and appears to cold boot, my logs show nothing untoward - no errors. This is an AMD 3700 on an ASROCK TaiChi board. 32gig RAM, 4U SuperMicro case with good cooling fans. When I returned home I flashed the firmware to the latest version, lowered clock speed, put the CPU in "eco mode", and increased fan speeds. Temps don't show on the main screen for me in unRAID but in the diag screen of the BIOS I noted a significant drop in CPU temp. I also slowed my memory to a default 2400mhz vs it's rated 3K+ and lowered the RAM voltage somewhat. I ran Memtest through a couple of iterations but not for terribly long as I wanted my server back up. About 19 hours later the system dropped again <sigh> This time when it came up I halted the parity check (no errors noted in previous runs until it booted) and the system has run fine for an entire week - until today. Since this issue began this is the longest it's run and I had surmised it was possibly an issue with parity building, now it's dropped again and I'm not so sure. The only thing I have seen in the way of drive errors is a single drive that's showing UDMA CRC errors in SMART reporting slowly increasing. This is a Seagate 4TB drive, if I'm seeing 90+ percent complete that drive shouldn't be in play. I'd like to replace it but if I cannot complete a parity check to rebuild to a larger drive that's not going to be possible I fear and realizing it's not likely being accessed when this drops makes me wonder what the real issue is. My Parity drive is 12TB, I have a total of 17 data drives of various sizes - 4, 5, 8, and 10. I have some SSD attached via SATA and a PCIe 4 NVME as cache. I do have a backup but that's a last gasp as it's a huge undertaking to restore. My fear is something will get corrupted during one of these boots and I'll lose encrypted files. My backups are daily and every file gets accessed - no errors occurring. I'm stumped and would like some suggestions please! The lack of log entries is pretty frustrating but makes me wonder if this is a straight hardware issue. I'm hesitant to swap parts without a clearer indication of what's dorked up. I'm letting it run parity again now, temp in the house has dropped with the outside temp so temp should NOT be an issue - drive temps 28-34C with most on the lower end of that range. If it drops this time I'm thinking a day's worth of Memtest maybe? My drive formats are XFS, I suppose that could be checked for corruption just in case. I might do that tonight starting with the larger drives.
  8. Successful upgrade from an old RC - my funky full encrypted disk setup moved over just fine, thanks guys! Looking forward to the next revision to support my X570 TaiChi, will test when NVIDIA supports it! P.S. Docker Swarm?
  9. I don't have active cooling on it, next I pull it out I'll see what I can do about adding some cooling. I have airflow as it's a SuperMicro chassis but it's not ducted over there. Watching things further I think the main issue could be the Mover process destroying performance when it runs. My cache drive is an M2 PCIE4 drive but when Mover fires the system becomes nearly unresponsive. Just frustrated I suppose as I bump the space limits on the 1TB cache drive moving videos around of late and performance tanks hard. I'm on 6.8 RC7 which has been stable but it looks like I'm two revs behind so perhaps there's help to be had there. I don't think I want the release 6.8 though as I think that was a kernel step backwards! Parity checks seem low as well with a speed of 113MB/s - 100+TB of space with a 10TB parity drive. Takes a day to check but other than being "slow" it doesn't impact things too badly.
  10. Purchased one of these awhile ago as I was no longer able to run 3x dual port cards. New Ryzen boards don't have the slots and I had to run a video card now (which I use for transcoding so no biggie). I've noticed that I seem to bottleneck during parity checks and when Mover strikes I see it bogging down too. Could I have made a better choice? I need to support a max of around 24 drives, I've got an expander kicking around but have never used it. Would I be better off using that somehow with an existing 8i 2 port card? I see fairly significant IOWait times in NetData from time to time when really pushing data around but some of this could be my drives. Can I do better for a reasonable cost? Edit: reading some other posts it looks like I ought to have enough speed for spinning rust. Perhaps it's simply my drives after all but seeing IOwait numbers climb from time to time is pretty frustrating!
  11. Just wanted to post and say THANK YOU!!!!!! Thanks to the efforts of the guys recompiling for the NVIDIA driver I was able to load up RC7 tonight and gve it a spin. The issues described above appears to be FIXED! My Go script no longer has to create a link to a cleartext password in order for me to boot my server - woohoo! I manually entered my password and the server started just fine - big Snoopy Dance! My thanks to the @limetech guys and to @dlandon for solvin this - much appreciated!!
  12. I've always wanted an easy way to PXE boot, this sounds promising! Thank you!
  13. You have to admit updates have been coming pretty quickly, thus far I've seen no huge showstoppers and am pretty excited about the new code! Lots of improvements and some of the speed issues of the past seem solved.
  14. I was mentioning it as an aside as it was something odd I had noticed and I'm not sure what's causing it, RC related or otherwise. You need not be so defensive.
  15. I'm now on RC5. One thing I've noticed going down is that a second drive I've got mounted with Unassigned Devices seems to hang and be forced unmounted. Each time the array comes up it forces a parity check now. This started a few RC back but I figured it had to do with other things going on. It's formatted XFS and I see errors about the XFS drive not unmounting go by as it goes down and that drive is the only one I've got formatted XFS. Not clear to me what does it, I can say I don't stop VMs or containers before rebooting but this drive isn't used for that anyway <shrug>. Just mentioning it...
  16. Would that allow for the entry of a text passphrase? In essence that undoes the feature you've implemented but with the added benefit of deleting the file. I'm okay with that risk! I recognize other plug-ins could snatch it and that's a good catch to think of but I can't be but so paranoid 😮 You bring up a good point, I guess I'd ask - is this taking an edge case and extending it? Alternately, would it be possible to re-prompt for the passphrase when this occurs? I don't know what control you get when a drive mounts to fire a prompt. It occurs to me that if you could do that you could even use a different password for the mount maybe? This would allow for your "USB storage for backups to be attached "use case and also allow encryption. Not sure it's doable but maybe a solution? I mentioned my drive occasionally dropping out on my mobo controller, I've since moved it. In this case the issue you've pointed out would trip me up but when it's occurred I've always shut down in order to restart my containers and VMs so in a sense it's already hit me. The question would be - would this impact anyone else? So far I seem to be the only one whining, that said @SpaceInvaderOne was the one I got the idea from so others might be doing it too but not on the RC yet
  17. I originally moved my VMs and Containers off of the cache because the size of the storage wasn't enough for moving large files - not without spending a pile. This was of particular importance when running the mover began tanking performance of the entire system. Like many I move video files around and those can get BIG. Run out of space on the Cache drive while moving files around and things come to a halt. I needed as much space for actual cache as possible! I currently move files nightly but can perhaps stop that now that mover doesn't kill things. If I can use more than one drive for cache and keep it encrypted then sure, that solves the issue and hopefully moving things over will be easy. Bear in mind that if you run Plex with a decent library you're talking over a million (tiny) files for metadata. My Containers and VMs take up over 240Gig right now so yeah, I moved them off my cache drive and I'm betting others have too. Putting this on the array is a non-starter, the pause when a sleeping drive spins up and general performance of spinning media makes that clear. I didn't move my files on a whim and I encrypt them to deny a thief, get your stuff stolen a time or two and you get angry like that. Look back and guess who it was that asked for crypto to be added years ago As for backing stuff up - I'd want that encrypted too! Why bother encrypting otherwise? If I'm pulling sensitive data off for safe keeping I'd like it safe. Currently my personal backup is cloud storage and that's heavily encrypted too. Being able to plug in USB for transfer, connect to other datastores, and things I've never thought of though are awesome features but I've not been using those a much. Let me clear up how I'm currently forced to run my system. I've got a cleartext file on my USB containing my password. I have a line in my GO script that creates a link to it in the ephemeral filesystem on boot. My system currently boots hands-off and is completely insecure this way. I've left it this way to continue testing the RC. I've seen some weirdness with UD dropping drives that I've attributed to controller issues and my logs filling with crap, but otherwise it's been stable overall. I still badly need Swarm support though! I've never tried uploading a file to boot my system, doing it on mobile might not even be possible. Using a file like a JPEG or whatever hadn't occurred to me until I saw someone else mention it. Switching to something like that at least makes it less obvious what cleartext file on my system is the password but mobile is still an issue. I wonder if the browser will try to remember my file selection? I have another server I can play with so perhaps I'll use that to work things out instead of risking my primary. I'll move to the larger cache "pool" when it's available if that keeps things secured but it's sounding like that's not exactly around the corner.
  18. So if I'm remote and need to restart my computer from my phone or tablet - not a PC - I may not be able to do so? It would seem we've just lost functionality here. I'm often far away from home when I'm forced to reboot or need to enter a password to decrypt and I do it from my phone or tablet - IOS. Storing a cleartext password on my PC vs storing it ephemerally on my USB stick (with an option to remove) appears to be less secure. As it stands now I can use a phrase or pull a password from a secured app. I've seen where others have tried to use image files and things for a password file and perhaps that's better were my password not already set. How will multiple pools change this situation? If you're saying that we'll be able to have drives outside of the array for VMs and Containers that's terrific - will we be able to encrypt them?
  19. How exactly is this a "fix"?! I'm using this method now and it requires me to place a cleartext password on my drive where it's easily accessible - this defeats the encryption completely! Frankly I liked it better where a temporary file was created because at least then it would be ephemeral if power was removed. Am I somehow misunderstanding you here? If I'm understanding you correctly this completely defeats the purpose of encryption the drives - please reconsider!
  20. If you could please include the Overlay networking module for Docker Swarm it would be appreciated! Not yet checked RC4 but it wasn't in RC3, without it Swarm members cannot see other containers off the box and I'd like to include my server in a cluster To solve the crypto issue I've placed a keyfile in my config directory and added the following line to my Go script prior to the emhttp line. It creates a link file on the dynamic OS to see it on boot. -> ln -s /boot/config/keyfile /root
  21. RC3 hasn't been installed here yet but I'll try to do so today and report back. My password is under 25 chars in length. Edit: RC3 has the same behavior as RC1 with the added "benefit" that I can no longer go back to a working version of unRAID via the GUI. I've been able to edit the Go file from /boot/config to add my link file back to work around this for now. A parity check was triggered as well. Manual use of the emcmd against my drive resulted in a hang same as the other user, no output provided at the console from this command.
  22. @dlandon any thoughts on the diagnostic I posted the other day? Very curious as to what's going on
  23. I'm hoping Limetech will put the overlay module into their baseline kernel. If they can get to 5.4x we should see X570 chipset support too I believe.