DarkMain

Members
  • Posts

    35
  • Joined

  • Last visited

Everything posted by DarkMain

  1. 5 days and there have been no more crashes. Looks like the Macvlan to ipvlan was the fix. Cheers.
  2. I did see that in the patch notes, but a couple of things stopped me from changing it. 1 - Unless the setting was changed when I updated to 6.12, its been macvlan for ages and never been a problem. I figured (perhaps incorrectly), its not caused a problem in the past so why change it. 2 - The "help" says "The ipvlan type is best when connection to the physical network is not needed.". Maybe in interpreting that incorrectly but my containers are a combination of host / bridge and one is br0 and has its own IP address. I was worried that by changing it to ipvlan I might break something so I just left it. I'll give it a shot though and see how it goes.
  3. So I rebuilt the docker image last night and when I got home from work today the server had crashed again. I've attached the new syslog. syslog-127.0.0.1.log
  4. Just out of curiosity, what in the log gave you that answer? And for my own piece of mind... are all following messages ok? Dec 4 20:51:00 Tower kernel: BTRFS error (device loop2: state EA): parent transid verify failed on logical 335855616 mirror 2 wanted 5033596 found 5032512
  5. K, here's the syslog. Looks like it might be something to do with unassigned drives and a btrfs file system? Note: These drives were all working perfectly fine before the update. syslog-127.0.0.1.log
  6. Yesterday I updated to 6.12.5 from 6.11. Everything seemed to go fine, but when I got home from work the server had crashed. I reset it and it was running fine... I watched a movie from Plex and then went to bed, then today when I woke up it had crashed again. There has been no major changes to the system. Hardware is the same and plugins were updated before the OS update but that's it. I have attached the diagnostics to this post. Syslog server was not enabled during the first 2 crashes but its on now (however it doesn't seem to be writing anything to the Local syslog folder). tower-diagnostics-20231203-1532.zip
  7. Update: The 870 EVO seems to be working fine as well (network speed inside the VM is lower than expected but that's another issue to figure out). Strange thing is, I have taken one of the older SSDs that wasn't working properly in UnRaid and I have put it into a Windows machine and I can not for the life of me, recreate the issue. So the problem has been 'solved' but I still don't quite understand why it was happening in the first place.
  8. Got the VM setup on the 8GB drive and I'm not able to recreate the issue using this drive either. This time its setup as an unassigned drive (rather than a cache pool). Its formatted as btrfs, however I'm pretty sure some of the SSDs I tested were also BFRS, so I don't think its a file system thing (although I cant be 100% on that) Looks like I'm off to the store tomorrow to pick up an 870 EVO and see how that goes.
  9. Cheers, I'll update the thread in a couple of days with the results.
  10. K, so I have tested it with the IronWolf and I was unable to recreate the issue with my usual go to method of making a VM and then copying a file to the running VM. This method has been pretty much a guarantee to reproduce the problem so that's good news its not happening. I also tested some older VMs running on the IronWolf and they ran much better than on the SSDs. That got me thinking... All of the problems have been when using SSDs (I assumed even a bad SSD would be better than a mechanical HDD so have never bothered testing with them). Right now, I'm making the IronWolf into the parity drive, and once that has been done (in about 18 hours) I am going to use the old 8GB Segate Barracuda (which you pointed out, is an SMR drive) and run the tests again. If I can recreate the issue then I can chalk it up to poor performing drives, however, if the issue still isnt present with the 8GB, I can probably say its an SSD only issue... If that is the case, my next step will be to go and buy a "high performance" SSD and test that. Can I get a recommendation from you for what SSD SHOULD work well in UnRaid. I keep seeing the 870 EVO and MX500 popping up as recommended drives but want to double check. Cheers.
  11. I'll give that a shot and let you know how it goes. Its eventually going to be the new parity (and I have a 2nd one to replace the older drives in the array). I'm actually dealing with another issue right now... In the last few days a lot of my drives have started giving me UDMA CRC error counts. Its actually since I put the new LSI 9201-16i 6Gbps 16P SAS HBA card in, so I'm going to have to remove that and put all the drives back onto the MBoard. I had to stop the parity rebuild as drive 5 was getting new error counts every few mins. Not having much luck with the system lately, I guess after, I dunno, 10+ years? (When was UnRaid 3 released?) of no issues they were bound to catch up to me.
  12. Were not talking about slow peformance though. We are talking about a complete freeze in I/O operations. If I try copying a single large file, it will copy, let say 2 or 3GB, and then the performance will literally drop to nothing for about 10+ seconds (and I get a whole bunch of CPU_IOWAIT errors. Makes sense as my understanding is that error is the processor waiting on the drives). Then the speed will jump up again, then back to nothing. Yo-Yoing up and down until the transfer is finished. It doesn't matter how the drives are used. Cache pool, unassigned drive, pass through to a VM... its always the same. Its a bad analogy, but think of it like a CPU that's throttling because of poor cooling. It get too hot so the CPU throttles and the performance drops... Because the performance has dropped the CPU cools down, because the CPU has cooled down the performance goes up again, but then it gets too hot and throttles... Its kinda like that but much more aggressive. It doesn't seem to matter how the copy is initialized. Network transfer, krusader in docker... even the mover script has exhibited this behavior. The SSDs were previously used as Windows boot drives and they were fine then, so even if they aren't performance drives they should NOT be acting this way. Its not normal and I have never seen this behavior in a drive before. Its really driving me insane and the fact that a brand new install of UnRaid exhibits the same behavior on two completely (all be it old) systems makes it even harder for me to try and narrow down what's causing the problem.
  13. As mentioned previously, the SSDs are NOT part of the array.
  14. Any idea why the SSDs would be going that slow?
  15. Its dropping to below 10MB/s at times and Glances is giving me a "CPU_IOWAIT" error. I know that SMR drives are slower but they shouldn't be dropping that slow. (I'm actually in the process of swapping the parity to one that's not, but got an error, hence the rebuild before swapping the drives) It's not just the array though. Its ALL the drives that are exhibiting this behavior, even the SSDs in cache pools or unassigned devices. I'm kind all out of ideas. Its getting bad enough that I've considered just getting a new server and starting again, but considering the issue followed me to completely different hardware on a brand new install of UnRaid, I don't want to spend a heap of money on a new computer just to find it still has problems.
  16. Diagnostics are attached to the first post. ...and these one are from right now. tower-diagnostics-20220802-2258.zip
  17. Ok, this is getting ridiculous now. Its happening on a parity rebuild.
  18. Update: I have dusted off an old i5-2500 I had sitting around and did a test using that hardware. The ONLY things that were the same were the test SSD and the test USB. Everything else was different hardware. Booted into UnRaid, started the VM and did a file copy... Same issue. Now I'm completely stumped and have NO idea where to go now with the testing.
  19. Slow SSD removed from the system and test drive directly connected to the MBoard (not in the Icy-Dock any more) and the issue is still present. I was able to copy the ISO over this time with no problems but I'm currently installing the VM and I can see that the write speeds are all over the place (doing the expected 100MB/s+ and then dropping down to 0B/s. I'm pretty much out of ideas now. Its possible its a BIOS issue but I have no idea what setting would cause this + I have reset the BIOS multiple times while trying to figure this out in the past. Maybe it just doesn't like the Motherboard? Or Firmware issue?
  20. No. Like I said in the original post, its basically any sustained disk activity. Copying from an Unassigned Drive to a User share using Krusader does it. Its noticeable when installing an OS in a new VM (install times of 2+ hours), or even using the VM. The VM will report 100% disk activity when doing simple things like browsing the web. Even "Mover" can cause it to happen. Side note: I haven't noticed it happen when using the Unbalance Plugin or when doing a file cope via the terminal, although I don't do that very often so I might be just a coincidence that I haven't noticed it. When I upgraded up cache drive I have to move all the files to the array using Unbalance because Mover was exhibiting the issue and was going to take 10+ hours. Unbalance got it done in about 40 mins I think.
  21. This issue has been plaguing me for a while now and I finally want to get to the bottom of it. Sometimes when doing a file transfer, the transfer will eventually drop down to 0kbps for a while (10+ seconds) before resuming, building up speed again and then dropping back to 0. It then bounces around like this until the copy finishes. It doesn't seem to matter how I'm doing the transfer or what the source/target drive is. Its always the same. I can recreate the issue 100% of the time when I'm coping a file in a VM, but it also happens when coping a file over the network to a share, or if I'm moving from an unassigned drive to a disk (not user share). ANY sustained disk activity seems to exhibit the problem. It seems to happen the most when copying larger files, however lots of smaller copies in a short amount of time can also cause the issue. I have just tried booting UnRaid using a brand new trail version of 6.10.3, which means everything is default settings and 'fresh'. There was a single SSD in the array and NO parity. While copying the Win11 ISO over to the "ISO" the issue happened again, I didn't even need to create a VM to recreate the issue this time. This tells me its hardware related. After I encountered the issue, I then installed "Glances" and "DiskSpeed" dockers. System Specs are: Unraid: Unraid server Pro, version 6.10.3 Motherboard: ASUSTeK COMPUTER INC. RAMPAGE IV BLACK EDITION Version Rev 1.xx - CPU: Intel® Core™ i7-4930K CPU @ 3.40GHz HVM: Enabled IOMMU: Enabled Cache: L1-Cache = 32 kB (max. capacity 32 kB) L2-Cache = 256 kB (max. capacity 256 kB) L3-Cache = 12 MB (max. capacity 12 MB) Memory: 24 GB (max. installable capacity 96 GB) ChannelA_Dimm1 = 4 GB, 1600 MT/s ChannelB_Dimm1 = 4 GB, 1600 MT/s ChannelB_Dimm2 = 4 GB, 1600 MT/s ChannelC_Dimm1 = 4 GB, 1600 MT/s ChannelD_Dimm1 = 4 GB, 1600 MT/s ChannelD_Dimm2 = 4 GB, 1600 MT/s 1000W PSU I have run a MemTest on all stick for 48hours and had no issues. I have also removed all 6 stocks and tested them 1 at a time and the issue is present regardless of what stick is used so I don't believe its a memory problem. The Motherboard has 10 SATA ports but no HDDs are currently connected to them (for trouble shooting reasons). I also have a "LSI 9201-16i 6Gbps 16P SAS HBA" and a "NEC LSI 9207-8i 6Gbs HBA". The 16i was purchased for troubleshooting to remove the motherboard and its controllers from the equation but the issue remains regardless of what controller they are connected too. All HDDs are in Icy-Docks. The 2.5" drives are in an "ExpressCage" and the 3.5" drives are in some old "FatCages" (I think that's the model. Its the 3to4 and 3to5 ones). I have run "DiskSpeed" on all of my drives and all of them (apart from one) is showing the expected speeds. The 15 second controller/all disk test also shows expected speeds (although I would say 15 seconds might not be enough time for the issue to show its self) One of my SSDs is showing really slow speeds (20Mbps) which is a new issue so I'm going to remove it for now, although its not part of the array or pool so it shouldn't really be effecting anything. I'm going to stick with the trial version and single drive setup for now and see if I can solve the issue on here before going back to my proper USB. I have attached the diagnostics but I'm not sure how much help they will actually be. My next step is to remove the backplane from the equation (connect the test drive directly to the card) and see if that improves anything. Any help figuring this out would be greatly approached as I'm on my last legs. I have considered just getting a whole new machine and moving the array over, but I'm worried the issue will move with it. Cheers. tower-diagnostics-20220716-2052.zip
  22. I figured that would be the case but just wanted to double check. Better safe than sorry. Cheers.
  23. Hi, I have a server that's having some I/O issues that I'm trying to diagnose. I am able to recreate the issue 100% of the time using a VM (not limited to VM's though) which means I don't need a valid array. To try and narrow it down to either a bad config or bad/incompatible hardware, I want to boot my current hardware using a trial version of UnRaid and load VM onto it to see if it still had the same problem. No issue = Config error Same issue = Hardcore issue (or incompatibility) I have a couple of unassigned drives with no data on them for testing this, so I wont be touching any of the array drives. My question is... Is there anything I need to consider when booting into the new trial USB so I don't destroy my current setup?
  24. Nope, sorry. Still no solution. I've just installed a new RAID card and pulled all the drives off the mother boards to see if one of the controllers on there was causing the issue. Its honestly driving me nuts.
  25. I have multiple VMs. The video is just showing the issue that is present in all of them. (It was a brand new install of Windows 11 for testing hence the low ram). This one had the same number of CPUs (same config, no other VMs running). 4GB memory (Initial and max both the same size), Machine was Q35-6.2, Bios was OVMF TPM, the vDisk was qcow with the VirtIO bus. Like I said, its 100% repeatable within a VM regardless of how the VM is set up. I can make another video showing whatever you like on the screen with any VM if that will help. If I get lucky I MIGHT be able to do a file transfer or two before the issue starts. (Its not just restricted to file transfers either, its any HDD activity, hence the unusable VMs) The vDisk is on a UD because it was installed there before cache pools were a thing. I have been using Unraid since very late version 3, so somethings may not be 'standard'. (MY docker app data is called 'applications' because 'appdata' never existed in earlier versions, Same with the 'System', 'Domain' and 'ISO' dir.) Regardless, With the Windows 11 VM I actually made a new pool to test it out before making this post. Yes the UD is an SSD and its formatted in XFS. You're probably right about it being Hardware but I have no idea where to start with diagnosing it, and like I said, VMs are the only thing I have found that I can use to reliably create the issue. (It used to be a problem in Krusader, but that was random and also involved an old spinning drive so I just lived with it, however, I haven't seen the problem in Krusader in at least 3 months.).