TexasDaddy

Members
  • Posts

    23
  • Joined

  • Last visited

Everything posted by TexasDaddy

  1. I tried running the nodes on their own /temp directory local to them and all jobs on those nodes would fail to copy. Maybe then my issue is with tdarr?
  2. I know this is an older topic, but tried to follow the instructions and guidance given and am unable to get this working. Perhaps it's a change in unRAID that is preventing this process from working but I wanted to try and see if I could get this to work. I have update my "go" and "smb-extra.conf" files as noted below. The share does show up, but is not accessible. Let me know what I'm missing, please. go file contents: #!/bin/bash # Start the Management Utility /usr/local/sbin/emhttp & # force iptable mangle module to load (required for *vpn dockers) /sbin/modprobe iptable_mangle # Create transcodes share mkdir -p /mnt/transcodes smb-extra.conf file contents (user has been modified to protect the innocent, using actual username from configured user on server): [transcodes] path=/tmp/transcodes comment = browseable = yes valid users = <my_user> write list = <my_user> public = yes writeable = yes vfs object = The directory is created and I noticed that it is owned by root, so I "chown nobody:users /mnt/transcodes" and "chmod 777 /mnt/transcodes", but that had no change. Share is still inaccessible from my PC. My goal here is to present a ramdisk share that is accessible from multiple nodes on my network for tdarr transcoding as opposed to wearing out my temp_pool SSDs that I had setup for this. If I were only transcoding on my server then I would just pass the /temp path into /dev/shm but that isn't accessible from my other nodes and the jobs would fail.
  3. BMC reset should have little or nothing to do with your unRAID server performance. It designed to access and manage your hardware associated with your motherboard, monitor onboard sensors, set manual fan curves, update BIOS, and remote console level access. Don't fret over a reset to defaults. Spin down of drives has several factors that are going to play into it. If you are using any SAS drives you absolutely must install the SAS drive plugin (from the apps tab). This is a known issue with SAS drives and things are getting better, but you should still use the plugin until there is better native support of SAS drive management. If you are not using SAS, then you should take a look at your Drive Settings to validate/update your spin down wait time. Also, are you using the Turbo Write plugin? This will affect drive spin up. What is your mover schedule? Are you using the Mover Tuning plugin? If you have a smaller cache, and have mover scheduled to run frequently to keep the cache from filling up, and you are doing lots of writes to your cache, then I would have at a minimum of 1 hour wait time on my spin down, with mover running every hour and turbo mode enabled for mover. Disks spinning at low speed with minimal disk activity is better for the life of the disk then lots of spin ups and spin downs. If disk life isn't a concern and you just want minimum power usage, then shorter wait times are better for your situation. Also, you may consider not using cache for those shares and have them write direct to disk. Then you can set your mover to run less frequently for those shares you need fast write speeds going to long term disk storage. These are all things to consider in building out your storage and performance plan. "I cant get the parity to stick" is a bit more concerning and very unclear as to what you mean. If you haven't yet, start a separate thread for this issue in the General support forum as this would be completely unrelated to the IPMI plugin and absolutely post with diagnostics so people don't have to ask for them to assist you. If your system is locking up or crashing before a parity check can complete then you need to take a step back, do hardware testing to find and isolate the issue, then slowly build upon a stable and working platform.
  4. I have an ASRock Rack board and mine displays the IPMI/BMC IP address as soon as I power up and before the system completes the POST to access BIOS. I can then look for the MAC in my DHCP. If you are not able to do that and you have a managed switch, you should be able to SSH into your switch and use commands similar to "show arp" and/or "show mac-address" (google documentation specific to your switch) to identify the MAC addresses by port. If that is not an option for you then you could also try a network port scanner to find all active IP addresses on your network and their associated MAC addresses (https://www.softperfect.com/products/networkscanner/). I've used this one many times over the years with great success. Resetting your BMC should have no affect on your server's functionality. If your BMC was set with a static IP that is not part of your existing subnet then doing the reset will be the only way to get it back to defaults which should set the network interface to use DHCP. Any of the methods I outlined above will not work if the IP was set to static outside of your existing network scope. The only thing that might work is if your system displays the IPMI/BMC IP address pre-POST and you added an IP address to your client computer that was in that subnet scope. Example: Your home network is 192.168.0.x 255.255.255.0 and the BMC was set to 10.0.0.86 255.255.255.0. You could add a second IP address to your computer's NIC by accessing the adapter's properties and setting your IP manually to what your current address is so you maintain connectivity to your existing network, and add a second IP address that would be 10.0.0.87 255.255.255.0 (not the same as the IP of the BMC, but on the same subnet) and then you should be able to access the web interface of the BMC by going to the BMC's IP address in your web browser, log in and reset the settings. But as a side note, if it has a static IP then it probably has a non-default password and your only recourse will be to reset the BMC in BIOS.
  5. I'm also an X570D4U-2L2T user and would LOVE to get the fan controller functionality working. This is a GREAT plugin and I love the unRAID community.
  6. So my issue was not with Preclear, but with my memory (as suggested I think the slower clock speeds with all 4 sticks populated where causing timing issues and corruption resulting in hung processes). On the 2 sticks at full stock speed, I precleared 5 disks simultaneously without issue. Thanks for looking and offering up guidance and suggestions.
  7. Alright, marking this solved as I've been completely stable since removing 2 sticks of memory. Multiple memory test were run with no issues found, both in my buddy's bench rig and in my server. I'm thinking the issue is with the system controller trying to run the memory at lower speeds when all 4 DIMM slots are populated. Hopefully this will be fixed with a BIOS update, but for now I'm more then happy to run with the 2 sticks at rated stock speeds and I don't really need the other 64 GB of memory at this time as my system is currently only using 14%. I'm going to use these 2 sticks for upgrading my backup server so it's just one (technically 2) less thing to buy. I GREATLY APPRECIATE all the feedback and advice.
  8. I ran MemTest for 2 days (back in January) before I even booted into unRAID for the first time, so when I started having stability issues memory never even crossed my mind. I'm trying to avoid having my system down for several days, but if it can't be avoided then I guess I must... Things have been rock solid with just the 2 sticks so I'm kinda avoidin pokin the bear.
  9. I've had system instability recently and someone else had mentioned they thought it looked like a memory issue from my diags. I pulled 2 sticks from my system and so far everything has been rock solid. I've got the 2 sticks I pulled being tested individually on my buddy's bench rig that he recently completed a full burn-in to validate all hardware but has not found any issues with my memory at this time. He is going to be running more extensive testing and will get back to me. If it all tests out ok, then I'm going to reach out to AsRock Rack support to see if there is anything they would like to investigate regarding the issue. The only thing that comes to mind is that the memory is 3200 MHz, and the MB slows it down to 2666 MHz with 4 sticks of DR DIMMS are populated. Perhaps there is a timing issue with the board. I finished my preclears without issue on my backup server, so I will need to get a new external drive to test if pulling the memory resolved the issue on my primary from completing the post-read process.
  10. LOL Yes, I do really love this motherboard. Thinking of buying a second for my other server, moving this Ryzen 9 over and downgrading my primary to a 65W TDP CPU like the Ryzen 7 3700x. This server only has 2U of workable space with the 12-bays in the back and my other server has a full 4U so I can put in a much better cooler and push the CPU without feeling uncomfortable about the fan noise or the temps. Thanks for looking at the logs and my stress level has come down quite a bit since things appear to be very stable at this point. If all the memory modules pass after rigorous testing, then I'll just put all 4 sticks in. If I start having problems again then it will be time to engage AsRock to see about RMA'ing the board and/or a possible BIOS update from them. I know they are a bit slow to release stable BIOS updates, but support seems to be pretty good with providing Beta BIOS fixes.
  11. You are correct sir. I was looking at what I'm running now, not what I was running before.
  12. So here are my latest diags. Parity check finished successfully yesterday and I let the server continue running through the night to see if I had another hang during an idle period, and all appears to be good at this point. I performed a normal reboot today and let the system run for a while before collecting these logs. No errors that I've noticed but wouldn't mind a second set of eyes on them if anyone is up to it. I've got a friend with a bench rig testing my 2 sticks I pulled the other day with the latest version of MemTest86. Testing them individually and so far no errors. If they both pass all testing then I guess I'll be dropping them back in and see if things continue to run without issues. As for the memory speed question earlier, I'm running a Gen3 Ryzen so it supports 3200 MHz, which is the speed of the memory I purchased from the QVL (4 x KSM32ED8/32ME). titan-diagnostics-20210405-1229.zip
  13. Just for some info/background. The CPU max settings you're seeing are accurate as I am using the Power Save CPU Governor setting from Tip and Tweaks to keep this CPU from running so hot during Handbrake transcodes. I'm going to get a 65W CPU to swap out in this server as I only have 2U worth of space to work with in this chassis so to keep the fan noise low I'm throttling for now and will swap for a lower TDP in the near future. I'll be putting this CPU in my backup unRAID server where I can put a much larger cooler in to better handle the heat and will offload the transcoding to that machine and just let it go full throttle. As for the "network errors" they are all UDP failures for syslog as I was sending my logs to a remote server during this instability and those are failures of logs trying to write while the LACP Team Trunk is being negotiated and initialized. As soon as that completes, the errors stop as it is able to write all events to the remote log server. I've since configured it to just write to the local syslog and will verify after this Parity check that those errors are no longer present during the boot process. So, after running without issue, or any errors or process kill messages in the logs since pully the pair of memory sticks I'm really thinking my issues are all related to bad memory. I'm not gonna call this solved just yet as I'd like the parity to finish and go through a significant idle period afterward to say things are now stable. Fingers cross, all goes well over the next day or two and I can start working on figuring out which stick of memory I'll need to RMA. I do completely agree that I too would like to see a clean log, so after the parity completes I'll be performing a normal restart and grab a fresh set of diags for all to look at just to make sure I'm not missing anything. More to come...
  14. So everything just crapped out again and I was able to get diagnostics before giving a shutdown command. Time to pull a pair of RAM sticks and see if things get any better. I might pull 3 and just run on one stick at a time till I find out if it is one of them. Is it possible to get @limetech (or someone from support) to possibly take a look and give some ideas on getting this thing resolved? titan-diagnostics-20210402-2155.zip
  15. I'm an idiot... the slow parity check was because of my Resilio Sync docker indexing my files to sync to my backup server. As soon as I stopped the docker, my parity speed shot up to about 155 MB/s. I did cancel the slow parity check, removed the line from my syslinux.cfg, enabled Global C States, and set the Power Supply Idle Control to typical. The Parity check started over after the reboot, but I'll keep my sync offline for the time being till this check completes successfully. I'm really hoping this resolves my issues as I was looking at some long downtimes and additional purchases to get to the bottom of these issues. Thanks to everyone's suggestions, ideas, and advise. I'll post back as things progress whatever the outcome.
  16. Here are my current diags. Someone's previous post mentioned disabling LACP, so I was looking through my system to see if everything was configured correctly and noticed that on my 1Gbps dynamic trunk that it was only showing 1 member. Looking at my switch and the syslogs I noticed the second member was auto-negotiating down to 100Mbps. I disconnected / reconnected the network cable from my switch and NIC port, then changed both of my trunks on the switch to Static LACP trunks and everything looks good in unRAID now, although bond0 reports as bond0: IEEE 802.3ad Dynamic link aggregation, mtu 1500 and bond2 reports as bond2: 2000Mb/s, full duplex, mtu 1500. The only config difference between them is bond0 has an IP address and I'm using that for share access, web UI, etc., and bond2 is my network interface for my docker containers. As for disk/controller/cable... Controller: LSI SAS 9300-8I Chassis: Supermicro CSE-847E16-R1K28LPB 36-bay (24-bay expander backplane on front, 12-bay expander backplane on back) Cables: CableCreation Internal Mini SAS HD Cable, 3.3FT Mini SAS SFF-8643 to Mini SAS 36Pin SFF-8087 Cable (1 to each expander backplane) Disks: mix of SAS and SATA III 10/12/18TB drives All array disks are on the front and the 2 parity disks are on the back. Maybe I just need to shutdown and reseat the memory modules and SAS cables. titan-diagnostics-20210402-1433.zip
  17. Yes, I've got 4 x 32GB Kingston ECC from the QVL list. I was going to do some more digging into it once the Parity check completes since I had to do the hard reset the other day, but that won't finish for another 4 days. The parity check had been taking about 1.5 days with dual 18TB drives, but it has been moving incredibly slow since this reboot, not sure if that's indicating another issue or just being exacerbated by the current issue(s). If it locks up before the parity check completes, then I will be reverting the Global C States, removing the syslinux.cfg mod, and pulling a pair of memory sticks to see if things stabilize. Perhaps memtest didn't detect the issue. I'm considering buying an inexpensive MB and CPU that supports this memory so I can build an open case for additional testing.
  18. For those of you running this board, what, if any, BIOS changes did you make for your build? I'm having stability issues and am struggling to pin it down without having to take my system offline for days of testing. After getting my 68TB of data moved and building out my docker needs I started getting system lockups and was getting about 1-1.5 days of runtime before locking up again. I made the "rcu_nocbs=0-31" change to mysyslinux.cfg file and got 3 days of runtime before locking up. I got hopefully when I hit the 2 day mark but still locked up eventually. I now have disabled Global C States, but people are saying both of the changes should no longer be necessary with 6.9.x. I'm running 6.9.1. Any ideas from your setups would be great!
  19. I ran a memtest for 2 days back in January when I first began assembly. Do you think I should be running another less than 3 months later?
  20. If I'm not supposed to get support here can someone point me in the right direction because there have been at least a dozen other topics opened since mine and nearly all have at least one response.
  21. I'm having some serious stability issues as of late. I started building an upgraded server at the beginning of the year and running MemTest for a couple days, testing CPU, read and throughput testing of drives, etc. before I started doing anything on the new build. I was running 6.9 rc-20 as I wanted to test out and use multiple cache pools. After my initial burn in, I began transitioning some of my basic containers that were simple to migrate and began copying my data over from my older server because I wanted to restructure my data storage and not use the same share structures and settings (lessons learned over the previous couple years) to provide better management and experience for myself. During this period I also upgraded to 6.9.0 when it was released, and to 6.9.1 a few days after release. I had parity enabled initially but decided to disable it during the file transfer to maximize my direct to disk writes as I was moving about 65TB of data. After the data transfer completed, I re-added the Parity drives back into the array and let Parity rebuild. Everything seemed fine until I started getting system hangs and unresponsive from the GUI and typically SSH and direct keyboard access (sometime SSH and direct access will work-ish although very slow. The first hang occurred within a couple hours after completion. The forced restart triggered another Parity check and the second occurred before the check completed. I was getting about 1-1.5 days of continuous runtime before the system locked up. I setup my stabel unRAID box (also 6.9.1) as a remote syslog server so I could capture the logs for the system crash (attached as crash_report.txt). I have also attached my diags that I ran immediately after the hard reset. I've read a ton of posts and forum topics about stability issues with Ryzen CPU's since I started having these issues. My first build was a Ryzen 7 2700x and never had a single issue like this (all issues were user caused, LOL). This new build is on a Ryzen 9 3950x. I have added the "rcu_nocbs=0-31" line to my syslinux.cfg file and thought that resolved my issues since I was up and stable for 3 days, but... locked up again. With this forced restart I have now also disabled Global C States in the BIOS as well since that seems to provide mixed results for Ryzen users. I'm hoping someone can take a look and give some advice as to any other issues they think might be causing me heartburn ATM. titan-diagnostics-20210331-1146.zip crash_report.txt
  22. Tried to preclear a new 12TB Easystore drive via USB (like I have done all my shuckable drives before shucking) twice now and it fails the post read both times about half way through the process. I'm running badblocks on the drive now, but appears to be the same issue previously reported. I am running the latest version of the plugin. Will post back with badblocks results in a couple days... Log: Mar 30 19:09:23 Titan preclear_disk_5147485459353754[29121]: Post-Read: progress - 50% verified @ 170 MB/s Mar 30 20:09:59 Titan preclear_disk_5147485459353754[29121]: Post-Read: cmp command failed - disk not zeroed Mar 30 20:09:59 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd - read 6607001878528 of 12000138625024 (5393136746496). Mar 30 20:09:59 Titan preclear_disk_5147485459353754[29121]: Post-Read: elapsed time - 9:50:16 Mar 30 20:09:59 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd command failed, exit code [141]. Mar 30 20:09:59 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 6585550110720 bytes (6.6 TB, 6.0 TiB) copied, 35273.1 s, 187 MB/s Mar 30 20:09:59 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 3141291+0 records in Mar 30 20:09:59 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 3141290+0 records out Mar 30 20:09:59 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 6587762606080 bytes (6.6 TB, 6.0 TiB) copied, 35286.2 s, 187 MB/s Mar 30 20:09:59 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 3142296+0 records in Mar 30 20:09:59 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 3142295+0 records out Mar 30 20:09:59 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 6589870243840 bytes (6.6 TB, 6.0 TiB) copied, 35299.3 s, 187 MB/s Mar 30 20:09:59 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 3143298+0 records in Mar 30 20:09:59 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 3143297+0 records out Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 6591971590144 bytes (6.6 TB, 6.0 TiB) copied, 35312.3 s, 187 MB/s Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 3144311+0 records in Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 3144310+0 records out Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 6594096005120 bytes (6.6 TB, 6.0 TiB) copied, 35325.3 s, 187 MB/s Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 3145369+0 records in Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 3145368+0 records out Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 6596314791936 bytes (6.6 TB, 6.0 TiB) copied, 35338.3 s, 187 MB/s Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 3146396+0 records in Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 3146395+0 records out Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 6598468567040 bytes (6.6 TB, 6.0 TiB) copied, 35351.3 s, 187 MB/s Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 3147382+0 records in Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 3147381+0 records out Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 6600536358912 bytes (6.6 TB, 6.0 TiB) copied, 35364.2 s, 187 MB/s Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 3148391+0 records in Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 3148390+0 records out Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 6602652385280 bytes (6.6 TB, 6.0 TiB) copied, 35377.2 s, 187 MB/s Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 3149405+0 records in Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 3149404+0 records out Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 6604778897408 bytes (6.6 TB, 6.0 TiB) copied, 35390.2 s, 187 MB/s Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 3150464+0 records in Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 3150463+0 records out Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: dd output: 6606999781376 bytes (6.6 TB, 6.0 TiB) copied, 35403.3 s, 187 MB/s Mar 30 20:10:00 Titan preclear_disk_5147485459353754[29121]: Post-Read: post-read verification failed! Mar 30 20:10:04 Titan preclear_disk_5147485459353754[29121]: S.M.A.R.T.: Error: Mar 30 20:10:04 Titan preclear_disk_5147485459353754[29121]: S.M.A.R.T.: Mar 30 20:10:04 Titan preclear_disk_5147485459353754[29121]: S.M.A.R.T.: ATTRIBUTE INITIAL NOW STATUS Mar 30 20:10:04 Titan preclear_disk_5147485459353754[29121]: S.M.A.R.T.: Reallocated_Sector_Ct 0 0 - Mar 30 20:10:04 Titan preclear_disk_5147485459353754[29121]: S.M.A.R.T.: Power_On_Hours 76 128 Up 52 Mar 30 20:10:04 Titan preclear_disk_5147485459353754[29121]: S.M.A.R.T.: Temperature_Celsius 33 36 Up 3 Mar 30 20:10:04 Titan preclear_disk_5147485459353754[29121]: S.M.A.R.T.: Reallocated_Event_Count 0 0 - Mar 30 20:10:04 Titan preclear_disk_5147485459353754[29121]: S.M.A.R.T.: Current_Pending_Sector 0 0 - Mar 30 20:10:04 Titan preclear_disk_5147485459353754[29121]: S.M.A.R.T.: Offline_Uncorrectable 0 0 - Mar 30 20:10:04 Titan preclear_disk_5147485459353754[29121]: S.M.A.R.T.: UDMA_CRC_Error_Count 0 0 - Mar 30 20:10:04 Titan preclear_disk_5147485459353754[29121]: S.M.A.R.T.: SMART overall-health self-assessment test result: PASSED Mar 30 20:10:04 Titan preclear_disk_5147485459353754[29121]: error encountered, exiting... Edit: I'm guessing the issue is with my unRAID server. I moved this drive to my other server and it is now more than 70% complete with the post-read process. Need to track down my issues with my new build... Grrr
  23. Have you tried this? https://forums.unraid.net/topic/76812-possiblity-of-microsoftpowershell-plugindocker/