BLKMGK

Members
  • Posts

    978
  • Joined

  • Last visited

Posts posted by BLKMGK

  1. Well, no dice. I removed the NIC - no change. I removed the 2 added M.2 drives, no change. I updated the woefully out of date BIOS on the mobo, no change. At that point the only thing in the box was the GPU, my M.2 cache, and the new adapter but at no point were the drives recognized nor was I ever able to see anything from the adapter card that would allow me to configure it in any way. I did see an entry in the unRAID logs that mentioned it else I'd have concluded the card was dead.

     

    At some point I'll get this card running in another machine and try out some of the SAS tools. For now, my machine has been down a day and I need it running so I've rerun all of the older cabling and booted on the old PERC and 9203-16i :( It seems to have come up fine but I need to format the new cache drive to be complete. If anyone can tell me more about this 9305-24i adapter I'm all ears! I won't be racking this for a little while so I can screw with it but once I shove it into the rack I'm not going to be super excited about dragging it out :P

  2. It's a full-length slot, I tried yanking the NIC out and no change. I'll see about putting it in the slot where the NIC had been and see if that makes a difference, but it will interfere with my USB stick. It does occur to me now that you've mentioned this - I now have a full complement of 3 M.2 drives onboard. I seem to recall that these may also use PCI lanes. I will do some research and see if filling each of those impacts the slots. A distant ringing memory says that's the case but I'll verify it! Putting this card in another machine might also help as if it shows a BIOS there and not in my server it'll be a clue. I'll report back in a bit, thanks for the nudge!

  3. I'm attempting to consolidate two previous SAS controllers to just using this single controller. I've had this on my shelf awhile but had never attempted to install it since it required a complete recable. I believe I have the correct breakout cables sourced and it's wired to a SuperMicro backplane that has individual SATA connections. Upon bootup (Asrock Taichi) I see no BIOS for this adapter during POST and nothing in the UEFI concerning it either. unRAID boots fine except I see NO drives. When I examine my system using one of the unRAID plugins I've installed I see a line that mentions an LSI SAS3224 SAS-3 card so I believe the card is at least seen. 

     

    My thought at the moment is to install this into a Windows machine I've got and attempt to access it with SAS3Flash for which I believe I've found a current copy w\latest firmware. What I'm unsure of is what configuration may be needed, my understanding is this card is IT Mode out of the box - correct? Should I be seeing any options to access it during POST? Should there be notifications from it during POST? Having NO experience with this particular card I'm not sure what needs to be done - my hope had been it would simply work (lol). My system is down until I can get back to working on it this evening, but I'd appreciate any pointers on basic configuration of it - thanks!

  4. Updated from 6.9.2 all the way to 6.11.4 and appear to be running fine - Ryzen 3700. Large storage system with every disk encrypted, no issues. I halted all VMs and all containers prior to update, made sure no files were open, and it went smoothly. Took the opportunity of downtime to upgrade a disk! Will have to reboot when that rebuild is complete to downgrade an NVIDIA driver that autoupdated. <sigh>

     

    Only quirk I've found annoying is my Flash share is no longer visible. I assume this was a security change but I'd like it back and will accept the risks involved. I do seem to need to recreate my SSH key too but no biggie once I get my share back heh. Overall running well (for the last hour anyway) and the upgrade was smooth!

     

    Edit: aaaand I got my Flash drive back. On the Main screen just click on the drive and enable the share, easy peasy once I dug a bit in the right place!

  5. Updated from 6.8.3 NVIDIA - went smoothly! I did remove some deprecated plug-ins and apps afterwards and had zero issues installing the new NVIDIA drivers - no changes to my Plex install for transcoding needed much to my relief. I'm on an AMD x570 mobo (Taichi) with a 3700 and with some fiddling in the temp settings app I now have temp readings for the first time! I am also seeing fan speeds for seven fans - very happy! Temps for my SSD and HDD continue to work. Not seeing power readings from my UPS but I can see my GPU readings finally so that's nice. I'll fiddle with the UPS when it's not 4am lol.

     

    All in all this came thru nice and smooth so far, my thanks for all those who tested and posted about their experiences as I wasn't able to test this time around.

  6. Well, sitting at over 6 days uptime now including a full scheduled parity check without errors. I'm not yet positive what's cleared this error and that sux. Temps have been cooler and I've had the house open but I've also had a second known good UPS inline with the server. The only major thing that changed between months of uptime and barely days was a new dedicated 30amp electrical circuit, a new 2U UPS, and warmer temps coincided. I guess I'll be patient but for now it's reliable - pretty frustrating. I guess I'll also mark the previously swapped PSU good and feel okay about keeping it with my spare chassis - it would've been nice had that been it as that's easiest to swap on a rackmount. I supose I could put it in the chassis attached to the known good UPS and see if the server toggles between them but I'm not sure how I'd get the info. Anyway, I'll post anything new here and sure hopew like heck it stays up and no one else goes through this grief.

  7. Not the power supply. Ran a memtest for over an hour, no issues. Have put a spare UPS in line between the recently ionstalled LARGE UPS and the server to try and see if the issue is coming from the new rackmount UPS - that I just had to run a new circuit to install. If it's the UPS I'll be upset but happy to have found this. If this doesn't work then I'll be booting it in "safe mode" to see if that helps <sigh> and probably a day's worth of memtest!

  8. Well, that was short lived - based on text messages I got from friends it died at 1:30 so it was up maybe 5 hours. I have swapped in a new PSU and touched nothing else - it's begun a parity check. If it goes down this time I may try moving it off my uber expensive UPS and onto a small portable one I used previously just to rule tat sucker out. The one it's on now has a dedicated circuit, brand new batteries, but was put in use not too far before these problems cropped up. I'll be pretty upset if that thing is the issue!

  9. I'm aware of the hazards of overclocking, I don't think I've owned a computer that wasn't overclocked in some way in the last 30 or so years if not longer. This thing would be water cooled if I could fit it without cutting. The settings now are close to stock and in Eco mode should be keeping it from boosting much and using less power. Underclocking is often done to save power but I've got enough containers running I'd prefer to not go that route unless pushed. 

     

    SuperMicro chassis out of the box are quite loud. SuperMicro makes an SQ model PSU however that's dead silent and will only run it's fan when necessary, this cuts easily 50% of the noise. The rear fans are the next loudest but with a little tweaking fans that are normally used in the middle of the chassis can be used and that's what I've done. I did at one time try multiple third party fans that everyone claimed worked - they didn't lol. So yeah, this is cooled with good fans and not dead silent :) But WAY quieter than stock for sure. The CPU cooler is the AMD unit, 80mm vertical coolers work in this chassis but not 120mm. The OEM cooler seems to be handling the job fine - and has pretty colors when I pop the top 😛 

     

    Update: For the first time in a month, as far as I've seen, Parity Check completed successfully - whew! I feel a little more comfortable maybe swapping a drive, no additional SMART errors. Also no closer to solving the sudden reboot but I think I can rule out the parity process - that's GOOD news at least. If it goes down again that PSU is coming out for sure. Fingers crossed that's not soon and I can go back to my normal 6month long uptime. Wish I had better resolution but I'll keep watching it.

     

    Update 2: Well, that didn't last long - failed sometime after 5am last night. No time to swap PSU as I'm on the way out the door but I guess I'll be doing that tonight. 64 in the house last night so certainly not a cooling issue sheesh. Logs show nothing at all.

  10. 7 hours ago, Frank1940 said:

    Can we assume that you checked to make sure the CPU cooling fins were clean?  

     

    One thing to consider is the PS.  There have been several instances over the past several months where replacement of the PS has clear up situations such as you have described.   With 17 data drives, it would probably be difficult to borrow one from a friend.   You might want to consider purchasing one from a vendor with a generous return policy.  (If you do order one make sure it has a single +12V rail and adequate current rating to accommodate 36-to-48A current surge when all of the drives spin up.)  

     

    This is a rack mount server chassis, it can actually accommodate 2x PSU in a failover setup but I've only got one installed currently. I have only briefly pondered the PSU as it's pretty good quality and made for this kind of (ab)use. The chassis holds 24 spinners and can accommodate multiple SSD too. I do actually have a spare, more than one if I'm willing to suffer turbine whine come to think of it lol. If it goes down again (it's at 40% thru the check now) I may do this first thing as swapping it is one of the easiest things I can do actually! I'll be shocked if that's the problem but as easily as it's replaced and with a spare that seems a good first step.

     

    Oh and yes I cleaned the heatsink fins. This system has only been together just short of a year but since it seemed like a heat issue that was one of the first changes. Slowing the CPU and lowering voltages got temps down quite a bit (over 20C) so I no longer think this is heat :( 

     

    I also just realized I need to setup a syslog server somewhere to capture logs off box, I hate that we lose them when a system goes down. I might try building a script to dump them onto disk storage. I just came home and had this machine sleeping which broke the SSH connection ugh. This would be WAY easier if the syslog gave me clues. I look forward to the next release with its kernel having better support for my hardware and it's sensors!

  11. About three weeks ago my server dropped offline and I found it sitting at the boot main screen awaiting my crypto key input. I have a pretty solid UPS and my hardware is pretty new so this was puzzling, I was out of the town at the time. I thought that it might have gotten hot but logs showed no error (tailed in an SSH session). I brought the system up and it began a parity check. Last position I saw about a day later was some 90% complete - it dropped again. It did this one more time and then I was home. Each time it seems to get close to complete with parity and appears to cold boot, my logs show nothing untoward - no errors.

     

    This is an AMD 3700 on an ASROCK TaiChi board. 32gig RAM, 4U SuperMicro case with good cooling fans. When I returned home I flashed the firmware to the latest version, lowered clock speed, put the CPU in "eco mode", and increased fan speeds. Temps don't show on the main screen for me in unRAID but in the diag screen of the BIOS I noted a significant drop in CPU temp. I also slowed my memory to a default 2400mhz vs it's rated 3K+ and lowered the RAM voltage somewhat. I ran Memtest through a couple of iterations but not for terribly long as I wanted my server back up. About 19 hours later the system dropped again <sigh> This time when it came up I halted the parity check (no errors noted in previous runs until it booted) and the system has run fine for an entire week - until today. Since this issue began this is the longest it's run and I had surmised it was possibly an issue with parity building, now it's dropped again and I'm not so sure.

     

    The only thing I have seen in the way of drive errors is a single drive that's showing UDMA CRC errors in SMART reporting slowly increasing. This is a Seagate 4TB drive, if I'm seeing 90+ percent complete that drive shouldn't be in play. I'd like to replace it but if I cannot complete a parity check to rebuild to a larger drive that's not going to be possible I fear and realizing it's not likely being accessed when this drops makes me wonder what the real issue is. My Parity drive is 12TB, I have a total of 17 data drives of various sizes - 4, 5, 8, and 10. I have some SSD attached via SATA and a PCIe 4 NVME as cache. I do have a backup but that's a last gasp as it's a huge undertaking to restore. My fear is something will get corrupted during one of these boots and I'll lose encrypted files. My backups are daily and every file gets accessed - no errors occurring.

     

    I'm stumped and would like some suggestions please! The lack of log entries is pretty frustrating but makes me wonder if this is a straight hardware issue. I'm hesitant to swap parts without a clearer indication of what's dorked up. I'm letting it run parity again now, temp in the house has dropped with the outside temp so temp should NOT be an issue - drive temps 28-34C with most on the lower end of that range. If it drops this time I'm thinking a day's worth of Memtest maybe? My drive formats are XFS, I suppose that could be checked for corruption just in case. I might do that tonight starting with the larger drives.

  12. Successful upgrade from an old RC - my funky full encrypted disk setup moved over just fine, thanks guys! Looking forward to the next revision to support my X570 TaiChi, will test when NVIDIA supports it! :)

     

    P.S. Docker Swarm? :D

  13. On 1/7/2020 at 11:54 AM, smegger68 said:

    Not a bad choice of HBA at all.

     

    Do you have active cooling on it? The 16 port cards get hot as hell when pushing a lot of data, the 8 port ones are bad enough. Could cause throttling if you don't have a fan on it. remember, these were designed for the hurricane force cooling inside servers.

    I don't have active cooling on it, next I pull it out I'll see what I can do about adding some cooling. I have airflow as it's a SuperMicro chassis but it's not ducted over there. 

    Watching things further I think the main issue could be the Mover process destroying performance when it runs. My cache drive is an M2 PCIE4 drive but when Mover fires the system becomes nearly unresponsive. Just frustrated I suppose as I bump the space limits on the 1TB cache drive moving videos around of late and performance tanks hard. I'm on 6.8 RC7 which has been stable but it looks like I'm two revs behind so perhaps there's help to be had there. I don't think I want the release 6.8 though as I think that was a kernel step backwards!

     

    Parity checks seem low as well with a speed of 113MB/s - 100+TB of space with a 10TB parity drive. Takes a day to check but other than being "slow" it doesn't impact things too badly.

    • Like 1
  14. Purchased one of these awhile ago as I was no longer able to run 3x dual port cards. New Ryzen boards don't have the slots and I had to run a video card now (which I use for transcoding so no biggie). I've noticed that I seem to bottleneck during parity checks and when Mover strikes I see it bogging down too. Could I have made a better choice? I need to support a max of around 24 drives, I've got an expander kicking around but have never used it. Would I be better off using that somehow with an existing  8i 2 port card? I see fairly significant IOWait times in NetData from time to time when really pushing data around but some of this could be my drives.

     

    Can I do better for a reasonable cost?

     

    Edit: reading some other posts it looks like I ought to have enough speed for spinning rust. Perhaps it's simply my drives after all but seeing IOwait numbers climb from time to time is pretty frustrating!

  15. 16 hours ago, dlandon said:

    You've already posted this as a 6.8 bug.  Double posting doesn't get you an answer any faster and adds confusion.  We will work on this issue there, not here.

    I posted it there as it appeared to be a bug in 6.8, I had missed the change in the release notes and it was pointed out. At that point I had assumed it was the plug-in not having been updated and thus asked here. I will follow this there if you would prefer but I wasn't attempting to seek solution in multiple places at once.

  16. 2 hours ago, dlandon said:

    It has been updated to work with the api.  Post your diagnostics so I can take a look.

    Hopefully this has what you need! I'm currently running a link file in my Go so I'd have to break it to recreate but this isn't a long term fix obv. If you need me to do anything say the word and I'll work it as quick as I can to help out if there's an issue.

    minion-diagnostics-20191014-0132.zip

  17. Allow me to actually answer the question. Converting from entering a passphrase to a keyfile is something I *just* had to do. The links above will tell you how to setup crypto just fine or even change passwords but you'll have to dig pretty good to find a way to easily go from passphrase to a file so I'll try to save you some trouble. In the end you need to have a file in the root of your ephemeral boot drive named keyfile that contains your passphrase. In my case I did this by creating a file on my USB stick and a line in my GO file to create a link to it as detailed below. Note that this is NOT SECURE if someone snatches your server. For me this is a temporary thing until I can go back to entering it by hand down the road. Others have come up with schemes to transfer the file in question via SFTP from other servers and it's also possible to use binary files like say a picture vs a text file - what I've detailed isn't that but you get the idea.

     

     

    • Thanks 3
  18. 6.8.0 rc3 may have changed functionality that could be impacting Unassigned Drives. Specifically I have a drive outside of the array I mount using Unassigned Drives that is encrypted that fails to mount either manually or via AutoMount when I upgrade to 6.8.0 rc1. I can SSH into the array and display contents of all drives except the single Unassigned Drive I use for containers and VMs.

     

    Here's a link to the post in the prerelease area. Not clear yet where the issue lies but this worked fine in at least two previous releases and it seems behavior changed with the new 6.8 v1 release candidate.
     

     

     

    Edit: D'oh! There's a statement in the release notes that mentions they no longer save the decryption key to a file and that an API has been provided to allow Unassigned Drives to continue to function. Seems this is intentional, intended, and my use case was anticipated!

     

    Pretty please may I volunteer to test should that be needed assuming Unassigned Drives is updated? :D 

  19. Heading over to post a bug, release was a no-go for me. So close and looking forward to the new kernel; and hardware support :)

     

    Edit: Not a bug, a feature! Unassigned Drives needs an update to continue to work with encrypted drives is all. A security update has changed the way you decrypt drives outside the array. Worked around it for now with a link file in my Go but it's not a long-term fix.

  20. On 8/13/2019 at 12:23 AM, Leoyzen said:

    Hi I just build a kernel to support X570 motherboard (mine is msi x570 ace) and latest AMD Ryzen 2 3000 family CPU.

    The kernel version is 5.2.7 with AMD patches from AMD Ryzen 3000 series - Linux support and virtualization.

    Fix:

    1. x570 pci vfio bug
    2. AMD Ryzen CPU suppport for k10temp
    3. newest nct6775 driver for most X570 sensors(many mb use nct6797 as SUPER/IO)
    4. Realtek r8125 NIC driver
    5. and other features comes with linux-kernel-5.2.7( such as coffeelake igpu support and etc.

    The board sensors works fine (nct6775 and k10temp).

    You can use it also for whom has problem (such as driver not support by  4.19.56 which unraid current used)  to see if it helps.

     

    The unraid works fine for me but I'm not garuantee works for all of you, TAKE YOUR OWN RISK to use the kernel.

    (PS: I try to backport AMD and sensors support for the kernel Unraid 6.7 used but failed)

     

    If you have problem with this kernel, comment below and I'll see if I can give help

    image.thumb.png.60e6c4c93a9591fc2419f2cfa2a506b7.png

    image.thumb.png.9d51cab2895e33c1deec609187250115.png

     

    Attempting to give your kernel a whirl! I attempted to compile my own for testing but ran into issues that I'm still trying to solve where GCC appears broken using your script. However when I use your kernel I get an error about an "incompatible md version". Which version of unRAID was your kernel built for? 6.7 hasn't worked for me and I'd really like to get this working! I now have an AMD processor in my server and swarm would be awesome with a project I'm working. If we can prove this out perhaps we can get Limetech onboard with the proper modules and it could trickle down to the NVIDIA guys :D

     

    Thanks!

  21. Okay, much to my surprise I'm able to access SATA drives via the U2 port, thank goodness! I am using a U2 cable to my SAS backplane on a Norco 24. I had to turn on the SATA vs U2 functionality in the BIOS with the new flash. Here's what I bought ->

     

    CableCreation Internal Mini SAS HD Cable, 1.6FT Mini SAS SFF-8643 to Mini SAS 36Pin SFF-8087 Cable, Mini SAS 36Pin to SFF-8643 Cable, 0.5M

     

    No go on the IPMI but I'll admit I've not tried to dork with it, I'm not sure where to begin with the damn thing. It does look like I'm using the Realtek NIC and so far it's been fine and found by Linux. With this U2 addition I might actually use this WKS board in my unRAID machine and a TaiChi in something else - anyone using the TaiChi? Thinking a 3700X, 32gig of memory, and a 4port SAS paired with a 2port to have enough slots and still run my 1050 for Plex :D

  22. I have the WKS board too albeit not in an unRAID system currently. I can confirm that it's U2 port doesn't work as a SAS with a breakout - despite being told so when I bought it. I'll be flashing mine in a moment and I'll try to test it with the breakout I've got buit I dooubt that a simple flash is going to bring functionality. This board was advertised as having "lights out" management and this special port, thus far both have proven to be useless. Knowing what I know now I wouldn't have bought it. My unRAID system is likely to get a TaiChi X570 instead - possibly this week. Wish I could find some 3900X processors :(

    • Like 1