DaveDoesStuff

Members
  • Posts

    55
  • Joined

  • Last visited

Everything posted by DaveDoesStuff

  1. Ok so I managed a parity check a few days ago but I wanted to wait until it had been up and running for a least a week. As of Friday it has been. During this whole time I have had the nVidia plugin disabled/removed and now there are no crashes, USBs getting corrupted, other errors etc... Also as my issues started after moving to 6.9 RC1 I definitely think the issues I've experianced since have to have been related to nVidia plugin or the change in how unRaid works with the driver. None of these issues were happening when the driver/support was still baked into the OS natively.
  2. Presumabely no. I'm trying another parity check now that work is finished for the week. My unRAID is host to my pfSense router with a dedicated dual nic so I didn't want to risk doing anything mid week. Will post back if it completes...but I've changed literally nothing...except I've disabled nVidia plugin as there were a ton of errors in my logs related to it.
  3. Yeah I had roughly the same thing happen with the first USB that "went bad", I just restarted to change some fan profiles and then it was suddenly f'ed. That was after the same usb3.0 stick and usb 3.0 port being fine for over a year prior to 6.9.X
  4. Hmm, did you have any usb issues lately also? I've burned two flash drives since 6.9.1/6.9.2 and had many system lockups etc... What other issues have you been having? Maybe something will jump out at me!
  5. USB has been replaced with help from support on the licence. However after I left a parity check running overnight I awoke to a dead network (I run virtual pfsense and a dedicated dual nic) and the lovely kernal panic attached. Powering off and on brought it back up and for now I'm running minimal dockers/vms and stopped the auto parity check. It's probably just my paranoia now but I can't help but feel like this isn't a coincidence... Edit: forgot to add I will endeavour to capture and attach diagnostics when I am back home
  6. Thats definitely on the cards but only when the spare/replacement flash drives arrive. It's not an option now as the last time I had these syptoms the drive was dead after restart. ...I mean what are the odds of both drives dying/having these issues within a 2 month period, seems slim...but maybe I'm over thinking this.
  7. Howdy folks, As the subject says I have a second USB drive failing in two months. Last time it happened I didn't catch what was happening until after the drive (Kingston Datatraveller USB3.0) had completely failed and unraid wouldn't boot after a restart I was going for unrelated reasons. Hours of worry, a new USB (same model/brand/spec) and licence transfer later it was fixed. Now 2 months later I got a notification via telegram to say that fix common problems detected a problem with bread errors etc...and low and behold it looks like the issue is happening again. It should be mentioned that a few weeks back I had taken my server offline to change some BIOS fan settings and on first restart the USB failed and I had to plug it into another PC, repair it and try again (it worked). This time around the port the failing USB is in is different, not even close to the previous one on the motherboard rear IO. I've ordered some new USB2.0 Datatravellers and emailed limetech about re-assiging my licence again within the 12 month time limit. However I'm concerend I'll be right back where I am in a short while. I'm also concerned that as soon as I restart I will be unable to boot again with the current device. Which would be in keeping with what happened the last time around, are there any "in-place" actions I can take to repair the current drive without a restart? Or to at least confirm it is screwed? (and ideally why). Obviously I've gone over my logs etc but I'm not sure what I should be looking for here, any steer in the right direction would be appreciated. Additional Info: Motherboard: TUF B450-PLUS GAMING BIOS: Version 3002 (AMD AGESA 1.2.0.1) USB Port: USB 2.0 (Port 11 bottom, according to the diagram in the user manual) CPU/RAM/GPU/PSU Ryzen 2700X @3.7gHz stock 32GB RAM (4x Team Group Vulcan Z T-Force 8GB DDR4 3000MHz) Silverstone Essential 550W 80 Plus Gold nVidia Geforce GT710 1GB USB: Kingston Datatraveller USB3.0 Can't find a link to the particular one I have anywhere (guess they are out of production) but it looks like the below (ignore the model number): ibstorage-diagnostics-20210531-1126.zip
  8. Fair enough, i'll try to over engineer something to achieve the same thing! Thanks again.
  9. Hmm, didn't know that. To be fair it would be counter-intuitive for me to assume it's a load of crap lol but thank you for the info. So obviously there *was* an issue that lead to these two devices racking up 58.5TB in writes in just 4 months since I began using them...and it seems like possibly something I did either earlier or in the last 72 hours has actually fixed it but I was focusing on metrics that don't actually represent the TBW or possibly even misrepresent it. Now I'll just have to re-enable Qbtorrent and monitor it closely, then if that checks out Duplicati (which I feel is possibly the culprit) and take some additional actions depending on the outcome. Thank you for the fresh set of eyes on this Jorge! I had complete tunnel vision until you came along lol! Do you use any particular tools or commands on your own setup(s) to monitor this kind of thing? Or anything you would recommend over what I've utilised in this thread?
  10. So after running overnight... Cache 1: 114,313,326 [58.5 TB] Cache 2: 114,312,943 [58.5 TB] Thats a delta of roughly 35,000 on each drive. No this does not seem like a lot...but unRAID main reporting 3.9 million when it was around 2.6 million yesterday evening is still pretty bad so a server that was essentially only running a handful of low IO dockers and a pfSense VM.
  11. I believe what I posted is actually for the total time this "server" has been in operation/powered on. That's probably a bit useless come to think of it, I picked it up in a bug report Currently the 2 cache drives report: Cache 1: 114,276,540 [58.5 TB] Cache 2: 114,276,147 [58.5 TB]
  12. Yeah for sure its not adding up. I've got Qbtorrent disabled now, at time of writing this the TBW is: Let's see where it sits tomorrow morning. Thank you for the assit by the way!
  13. Nothing beyond the standard Radarr/Sonarr > Qbtorrent > Plex setup used by many. Here is a screenshot of QB at present along with a fresh iotop, the QB screenshot effectively represents my last 72 hours of torrenting. As you can see, not a ton. From what I've seen on the forums here I'd say it's a standard amount even. EDIT: I've just noticed that iotop, despite being the same window running for the last 6 hours seems to be giving inconsistant stats...haven't used it outside of troubleshooting this issue so not sure if thats normal. For example in the screenshot in this post taken at roughly 6hrs run time, compared to the previous 4hrs one the entries at the top for highest write are missing. EDIT2: Added share settings to the OP.
  14. That's what I've been thinking...but not sure how to narrow it down further, or possibly go beyound whatever bounds iotop has. Here is an SS from 4 hours, not much difference.
  15. Hi all, I've taken so many different actions to try and resolve this issue that it's hard to collate everything and to know where to start but here goes... (Diagnostics attached) ...so essentially since 6.8.3 I've been struggling with the excessive cache writes issue. Due to personal circumstances I was not able to do any meaningful troubleshooting until the January or so after moving to 6.9.0 It should also be noted that I not sure if I was experiancing excessive writes prior to December of 2020 when my SSD Cache Pool was 4x 250GB SATA SSDs in Parity (BTRFS) mode, but I have since upgraded to 2x WD Blue SN550 1TB NVME (WDS100T2B0C) where the issue first came to my attention or possibly even started. Since they've been installed (to this date) they have 114,252,802 [58.4 TB] data units written each. I've poured over every forum/reddit post about this issue and tried everything in them except for moving my cache over to XFS/Unencrpyted as I strictly require the encryption and redundency. I know I'm making life hard on myself with this...but I need it and prior to I'll have more specifics on writes below. My Specs are: Ryzen 2700X TUF B450-PLUS GAMING 32GB RAM (4x Team Group Vulcan Z T-Force 8GB DDR4 3000MHz) Silverstone Essential 550W 80 Plus Gold nVidia Geforce GT710 1GB My Array Configuration: Share Settings: After upgrading to 6.9.0 I've: Reformating cache pool to 1MiB partition layout as described by limetech HERE. Switching from Official Plex docker to Binhex Plex Pass container. Tried toggling various dockers/vms on and off to find a culprit, no joy. After upgrading to 6.9.1 I've: Switching Docker to use a directory instead of an image. Moved Docker back to a fresh btrfs image after the above didn't work. Tried toggling various dockers/vms on and off to find a culprit, no joy. After upgrading to 6.9.2 on 18/04 I've: Moved docker back to using a directory instead of an image again. Disabled my duplicati docker and anything else I don't utilise often (bazarr, krusader...etc...) Disabled all my W10 VMs, only a pfSense VM running now. Tried toggling various dockers/vms on and off to find a culprit, no joy. Following my upgrade to 6.9.1 and subsequent actions I let it run for 1 month without my interfering and in that time the cache had over 60 million writes...and the USB key failed. Which is actually what precipitated my upgrade to 6.9.2 this past Sunday and another round of troubleshooting this issue. TBW according to the cli: cat /etc/unraid-version; /usr/sbin/smartctl -A /dev/sdb | awk '$0~/Power_On_Hours/{ printf "Days: %.1f\n", $10 / 24} $0~/LBAs/{ printf "TBW: %.1f\n", $10 * 512 / 1024^4 }' version="6.9.2" Days: 2261.2 TBW: 81.6 TBW: 217.8 Cache Settings showing alignment: Current Docker Settings: Currently Running Dockers and their mappings: Screenshot of Main 14hrs after a stats resest (after moving docker back to directory mode from img): As you can see from the above screenshots, the writes are excessive. This is despite the cache re-alignment, move back to using docker in directory mode and the disabling of the only docker that should be doing large writes because of how I have it set up (duplicati). The only VM I'm running is for my pfSense and I've disabled any intensive logging I had running there. 14 hours before the time of the screenshots (after moving docker back to directory mode) I cleared the stats on Main and left an iotop -ao running on my laptop. Unofrtunately the laptop decided to reboot for updates (damn) during this time so I don't have the output from iotop but as you can see in that period the cache had over 2.3 million writes with not a whole lot going on/running. I've run another iotop -ao for the last 3hours before writing this post (without changing anything else) to give some sort of idea of what that overnight output would look like if I had it: I should also mention that yesterday I ran each docker 1 by 1 (with all others disabled) and ran iotop -ao for 10 minutes each but the results were inconclusive. As I've mentioned I have to have the cache encrypted and redundant. I do understand there is a write aplificiation expected from the encryption etc...but it shouldn't be this high. I've tried to be thourough and include as much current information and background as possible. But if I've missed anything please don't hesitate to ask! I can't help but feel like I'm missing something...any help/different perspectives would be very very welcome at this point. ibstorage-diagnostics-20210420-1106.zip
  16. I was able to get into the webUI and do it. However as of the 6.9.1 update this is broken for me again. No combination of installing and re-installing will fix it. So I've simply stopped using GUI mode completely.
  17. That's actually a really cool website, bookmarked. It never occured to me that the media, coupled with lack of 265 support was the problem...but it does make total sense thanks for the steer. I'll try a different format and lower the quality and report back! EDIT: So the HW transcoding kicks in when playing this second show and transcoding to h.264...but it seems to be using CPU not GPU...or am I misreading this? Hmm, does the source also have to be h.264 or could a GT710 transcode HVEC Main 10 to h.264 at all? I clearly need to improve my knowledge in this area
  18. Firstly, great plugin/work! Unfortunately I'm having some issues getting plex transcoding to work on my GT710. The very first time I installed the plugin and set everything up it worked fine until my next system restart. Since then every combo of troubleshooting steps I've tried failed to get it to ever work again. I'm running binhex-plexpass with a lifetime plex pass. Rest of the info is in the screenshots, I've tried to include any/everything I've seen in the other posts in this thread that seems to be of help. Sorry if I've overdone it It should be noted I have no errors in the systemlog or the container log for plex. None whatsoever. EDIT: For clarity I'm doing the PCI override because I have a dual intel nic for a dedicated pfSense VM, so it's not really optional.
  19. This was resolved by installing the Nvidia Driver plugin (and possibly limetechs underlying nvidia driver stuff).
  20. Hi team, Upgraded to 6.9.0-rc2 last night after I was forced to look into some issues with a parity drive (unrelated) and decided I was sick of the warnings about the mover tuning plugin being incompatible with stable branch (I'm easily annoyed). I rely on booting unRaid in GUI mode since I run a pfSense VM and 2 dedicated NICs as my home router via unRaid...and obviously this means if the array is down there is no webUI...hence my need for a working GUI mode on boot. After upgrading however all I get after going through the boot sequence is a black screen with a blinking cursor in the top left instead of the GUI. I've tried various ways of troubleshooting this myself such as: Enabling CSM Enabling CSM but allowing UEFI only/first Disabling CSM and forcing UEFI only Reseating my GT 710 BIOS defaults and re doing my settings one at a time GUI safe mode (issue persists/no change) I can boot in normal OS mode (cli) no problem, I was able to work around my pfSense going down by plugging a laptop directly into the unRaid NIC (pfSense NICs are on a seperate card) on the mobo and accessing the webUI but this isn't a practical/long term solution. Interesting to note that while the "system profiler" tool does not seem to detected the GT 710 that it appears as an option for GPU if I go into the create VM wizard (currently no VM is using the GT 710 so there shouldn't be anything "claiming it" and it has not been isolated in anyway). ibstorage-diagnostics-20210215-1138.zip
  21. A restart and a BIOS update seems to have fixed this. After running benchmarks it looked like the controller on my ASUS TUF B450 was having issues, hence why I decided to try a BIOS update.
  22. The original diagnotics I posted were from straight after the crash/shutdown and it just occurred to me that a recent set might be more useful (attached). ibstorage-diagnostics-20210112-0922.zip
  23. So yesterday around 12:30pm (ish, should have noted it down) my entire unRAID server went unresponsive. It was apparent since I run a 2x GBE card on it with pfSense...kinda easy to take notice when your internet goes down with it lol. Went to my home rack and tried to login via the GUI, which was frozen and wouldn't accept any inputs. So made the tough call to power off. Upon powering on the array started smoothly, all dockers, all vms on...no problems/everything seemed fine. Unfortunately there was no way for me to retrieve my syslog prior to rebooting 😪 However as of lastnight/this morning my CPU (AMD 2700X has been going wild doing the parity check, so much so that I can't stream anything from plex. Based on this I powered off or paused most if not all VMs/Dockers/etc that could be writing to the array. CPU usage/temps went down but no improvement on the plex front. It was then I noticed that the parity check has been running for over 18 hours with 2+ days estimated remaining...speed crawling along at around 16mb/sec. Pretty big yikes there. Hit pause on the check, turned all my dockers and resumed and speeds stayed the same. Decided to take this as a bad omen and reach out for some second opinions on possible causes here. Diagnostics attached. It should be noted that there is 1 drive with some SMART errors and its "Parity 1" a 4TB WD drive, it has a Reallocated sector count of 8 and a Reported uncorrect count of 24. However when I manually run the extended test it comes back as having no errors...which is strange. This has been the case for almost a year and I've never had parity problems before after a shutdown/reboot be it clean or unclean. I'm not saying that's not a candidate for the cause...just saying that it seems unlikely when its been like that for so long. Any and all help/pointers appreciated. ibstorage-diagnostics-20210111-1307.zip