BLKMGK

Members
  • Posts

    978
  • Joined

  • Last visited

Converted

  • Gender
    Undisclosed

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

BLKMGK's Achievements

Collaborator

Collaborator (7/14)

11

Reputation

  1. Well, no dice. I removed the NIC - no change. I removed the 2 added M.2 drives, no change. I updated the woefully out of date BIOS on the mobo, no change. At that point the only thing in the box was the GPU, my M.2 cache, and the new adapter but at no point were the drives recognized nor was I ever able to see anything from the adapter card that would allow me to configure it in any way. I did see an entry in the unRAID logs that mentioned it else I'd have concluded the card was dead. At some point I'll get this card running in another machine and try out some of the SAS tools. For now, my machine has been down a day and I need it running so I've rerun all of the older cabling and booted on the old PERC and 9203-16i It seems to have come up fine but I need to format the new cache drive to be complete. If anyone can tell me more about this 9305-24i adapter I'm all ears! I won't be racking this for a little while so I can screw with it but once I shove it into the rack I'm not going to be super excited about dragging it out
  2. It's a full-length slot, I tried yanking the NIC out and no change. I'll see about putting it in the slot where the NIC had been and see if that makes a difference, but it will interfere with my USB stick. It does occur to me now that you've mentioned this - I now have a full complement of 3 M.2 drives onboard. I seem to recall that these may also use PCI lanes. I will do some research and see if filling each of those impacts the slots. A distant ringing memory says that's the case but I'll verify it! Putting this card in another machine might also help as if it shows a BIOS there and not in my server it'll be a clue. I'll report back in a bit, thanks for the nudge!
  3. I'm attempting to consolidate two previous SAS controllers to just using this single controller. I've had this on my shelf awhile but had never attempted to install it since it required a complete recable. I believe I have the correct breakout cables sourced and it's wired to a SuperMicro backplane that has individual SATA connections. Upon bootup (Asrock Taichi) I see no BIOS for this adapter during POST and nothing in the UEFI concerning it either. unRAID boots fine except I see NO drives. When I examine my system using one of the unRAID plugins I've installed I see a line that mentions an LSI SAS3224 SAS-3 card so I believe the card is at least seen. My thought at the moment is to install this into a Windows machine I've got and attempt to access it with SAS3Flash for which I believe I've found a current copy w\latest firmware. What I'm unsure of is what configuration may be needed, my understanding is this card is IT Mode out of the box - correct? Should I be seeing any options to access it during POST? Should there be notifications from it during POST? Having NO experience with this particular card I'm not sure what needs to be done - my hope had been it would simply work (lol). My system is down until I can get back to working on it this evening, but I'd appreciate any pointers on basic configuration of it - thanks!
  4. Updated from 6.9.2 all the way to 6.11.4 and appear to be running fine - Ryzen 3700. Large storage system with every disk encrypted, no issues. I halted all VMs and all containers prior to update, made sure no files were open, and it went smoothly. Took the opportunity of downtime to upgrade a disk! Will have to reboot when that rebuild is complete to downgrade an NVIDIA driver that autoupdated. <sigh> Only quirk I've found annoying is my Flash share is no longer visible. I assume this was a security change but I'd like it back and will accept the risks involved. I do seem to need to recreate my SSH key too but no biggie once I get my share back heh. Overall running well (for the last hour anyway) and the upgrade was smooth! Edit: aaaand I got my Flash drive back. On the Main screen just click on the drive and enable the share, easy peasy once I dug a bit in the right place!
  5. Updated from 6.8.3 NVIDIA - went smoothly! I did remove some deprecated plug-ins and apps afterwards and had zero issues installing the new NVIDIA drivers - no changes to my Plex install for transcoding needed much to my relief. I'm on an AMD x570 mobo (Taichi) with a 3700 and with some fiddling in the temp settings app I now have temp readings for the first time! I am also seeing fan speeds for seven fans - very happy! Temps for my SSD and HDD continue to work. Not seeing power readings from my UPS but I can see my GPU readings finally so that's nice. I'll fiddle with the UPS when it's not 4am lol. All in all this came thru nice and smooth so far, my thanks for all those who tested and posted about their experiences as I wasn't able to test this time around.
  6. Well, sitting at over 6 days uptime now including a full scheduled parity check without errors. I'm not yet positive what's cleared this error and that sux. Temps have been cooler and I've had the house open but I've also had a second known good UPS inline with the server. The only major thing that changed between months of uptime and barely days was a new dedicated 30amp electrical circuit, a new 2U UPS, and warmer temps coincided. I guess I'll be patient but for now it's reliable - pretty frustrating. I guess I'll also mark the previously swapped PSU good and feel okay about keeping it with my spare chassis - it would've been nice had that been it as that's easiest to swap on a rackmount. I supose I could put it in the chassis attached to the known good UPS and see if the server toggles between them but I'm not sure how I'd get the info. Anyway, I'll post anything new here and sure hopew like heck it stays up and no one else goes through this grief.
  7. Not the power supply. Ran a memtest for over an hour, no issues. Have put a spare UPS in line between the recently ionstalled LARGE UPS and the server to try and see if the issue is coming from the new rackmount UPS - that I just had to run a new circuit to install. If it's the UPS I'll be upset but happy to have found this. If this doesn't work then I'll be booting it in "safe mode" to see if that helps <sigh> and probably a day's worth of memtest!
  8. Well, that was short lived - based on text messages I got from friends it died at 1:30 so it was up maybe 5 hours. I have swapped in a new PSU and touched nothing else - it's begun a parity check. If it goes down this time I may try moving it off my uber expensive UPS and onto a small portable one I used previously just to rule tat sucker out. The one it's on now has a dedicated circuit, brand new batteries, but was put in use not too far before these problems cropped up. I'll be pretty upset if that thing is the issue!
  9. I'm aware of the hazards of overclocking, I don't think I've owned a computer that wasn't overclocked in some way in the last 30 or so years if not longer. This thing would be water cooled if I could fit it without cutting. The settings now are close to stock and in Eco mode should be keeping it from boosting much and using less power. Underclocking is often done to save power but I've got enough containers running I'd prefer to not go that route unless pushed. SuperMicro chassis out of the box are quite loud. SuperMicro makes an SQ model PSU however that's dead silent and will only run it's fan when necessary, this cuts easily 50% of the noise. The rear fans are the next loudest but with a little tweaking fans that are normally used in the middle of the chassis can be used and that's what I've done. I did at one time try multiple third party fans that everyone claimed worked - they didn't lol. So yeah, this is cooled with good fans and not dead silent But WAY quieter than stock for sure. The CPU cooler is the AMD unit, 80mm vertical coolers work in this chassis but not 120mm. The OEM cooler seems to be handling the job fine - and has pretty colors when I pop the top 😛 Update: For the first time in a month, as far as I've seen, Parity Check completed successfully - whew! I feel a little more comfortable maybe swapping a drive, no additional SMART errors. Also no closer to solving the sudden reboot but I think I can rule out the parity process - that's GOOD news at least. If it goes down again that PSU is coming out for sure. Fingers crossed that's not soon and I can go back to my normal 6month long uptime. Wish I had better resolution but I'll keep watching it. Update 2: Well, that didn't last long - failed sometime after 5am last night. No time to swap PSU as I'm on the way out the door but I guess I'll be doing that tonight. 64 in the house last night so certainly not a cooling issue sheesh. Logs show nothing at all.
  10. This is a rack mount server chassis, it can actually accommodate 2x PSU in a failover setup but I've only got one installed currently. I have only briefly pondered the PSU as it's pretty good quality and made for this kind of (ab)use. The chassis holds 24 spinners and can accommodate multiple SSD too. I do actually have a spare, more than one if I'm willing to suffer turbine whine come to think of it lol. If it goes down again (it's at 40% thru the check now) I may do this first thing as swapping it is one of the easiest things I can do actually! I'll be shocked if that's the problem but as easily as it's replaced and with a spare that seems a good first step. Oh and yes I cleaned the heatsink fins. This system has only been together just short of a year but since it seemed like a heat issue that was one of the first changes. Slowing the CPU and lowering voltages got temps down quite a bit (over 20C) so I no longer think this is heat I also just realized I need to setup a syslog server somewhere to capture logs off box, I hate that we lose them when a system goes down. I might try building a script to dump them onto disk storage. I just came home and had this machine sleeping which broke the SSH connection ugh. This would be WAY easier if the syslog gave me clues. I look forward to the next release with its kernel having better support for my hardware and it's sensors!
  11. About three weeks ago my server dropped offline and I found it sitting at the boot main screen awaiting my crypto key input. I have a pretty solid UPS and my hardware is pretty new so this was puzzling, I was out of the town at the time. I thought that it might have gotten hot but logs showed no error (tailed in an SSH session). I brought the system up and it began a parity check. Last position I saw about a day later was some 90% complete - it dropped again. It did this one more time and then I was home. Each time it seems to get close to complete with parity and appears to cold boot, my logs show nothing untoward - no errors. This is an AMD 3700 on an ASROCK TaiChi board. 32gig RAM, 4U SuperMicro case with good cooling fans. When I returned home I flashed the firmware to the latest version, lowered clock speed, put the CPU in "eco mode", and increased fan speeds. Temps don't show on the main screen for me in unRAID but in the diag screen of the BIOS I noted a significant drop in CPU temp. I also slowed my memory to a default 2400mhz vs it's rated 3K+ and lowered the RAM voltage somewhat. I ran Memtest through a couple of iterations but not for terribly long as I wanted my server back up. About 19 hours later the system dropped again <sigh> This time when it came up I halted the parity check (no errors noted in previous runs until it booted) and the system has run fine for an entire week - until today. Since this issue began this is the longest it's run and I had surmised it was possibly an issue with parity building, now it's dropped again and I'm not so sure. The only thing I have seen in the way of drive errors is a single drive that's showing UDMA CRC errors in SMART reporting slowly increasing. This is a Seagate 4TB drive, if I'm seeing 90+ percent complete that drive shouldn't be in play. I'd like to replace it but if I cannot complete a parity check to rebuild to a larger drive that's not going to be possible I fear and realizing it's not likely being accessed when this drops makes me wonder what the real issue is. My Parity drive is 12TB, I have a total of 17 data drives of various sizes - 4, 5, 8, and 10. I have some SSD attached via SATA and a PCIe 4 NVME as cache. I do have a backup but that's a last gasp as it's a huge undertaking to restore. My fear is something will get corrupted during one of these boots and I'll lose encrypted files. My backups are daily and every file gets accessed - no errors occurring. I'm stumped and would like some suggestions please! The lack of log entries is pretty frustrating but makes me wonder if this is a straight hardware issue. I'm hesitant to swap parts without a clearer indication of what's dorked up. I'm letting it run parity again now, temp in the house has dropped with the outside temp so temp should NOT be an issue - drive temps 28-34C with most on the lower end of that range. If it drops this time I'm thinking a day's worth of Memtest maybe? My drive formats are XFS, I suppose that could be checked for corruption just in case. I might do that tonight starting with the larger drives.
  12. Successful upgrade from an old RC - my funky full encrypted disk setup moved over just fine, thanks guys! Looking forward to the next revision to support my X570 TaiChi, will test when NVIDIA supports it! P.S. Docker Swarm?
  13. I don't have active cooling on it, next I pull it out I'll see what I can do about adding some cooling. I have airflow as it's a SuperMicro chassis but it's not ducted over there. Watching things further I think the main issue could be the Mover process destroying performance when it runs. My cache drive is an M2 PCIE4 drive but when Mover fires the system becomes nearly unresponsive. Just frustrated I suppose as I bump the space limits on the 1TB cache drive moving videos around of late and performance tanks hard. I'm on 6.8 RC7 which has been stable but it looks like I'm two revs behind so perhaps there's help to be had there. I don't think I want the release 6.8 though as I think that was a kernel step backwards! Parity checks seem low as well with a speed of 113MB/s - 100+TB of space with a 10TB parity drive. Takes a day to check but other than being "slow" it doesn't impact things too badly.
  14. Purchased one of these awhile ago as I was no longer able to run 3x dual port cards. New Ryzen boards don't have the slots and I had to run a video card now (which I use for transcoding so no biggie). I've noticed that I seem to bottleneck during parity checks and when Mover strikes I see it bogging down too. Could I have made a better choice? I need to support a max of around 24 drives, I've got an expander kicking around but have never used it. Would I be better off using that somehow with an existing 8i 2 port card? I see fairly significant IOWait times in NetData from time to time when really pushing data around but some of this could be my drives. Can I do better for a reasonable cost? Edit: reading some other posts it looks like I ought to have enough speed for spinning rust. Perhaps it's simply my drives after all but seeing IOwait numbers climb from time to time is pretty frustrating!