Pjhal

Members
  • Posts

    28
  • Joined

  • Last visited

Everything posted by Pjhal

  1. So i used the bios setting that Squid linked too. But i just got this: ''ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen'' see short remote log: All_2022-2-14-20 11 1.html full Unraid logs: silverstone-diagnostics-20220214-2013_anon.zip The system hasn't crashed (yet) but i am a bit concerned that might do so soon. edit: this seems like the same issue to me: 6.9.0/6.9.1 - KERNEL PANIC DUE TO NETFILTER (NF_NAT_SETUP_INFO) - DOCKER STATIC IP (MACVLAN) At least the same one i am seeing in the logs right now, not sure if it is the same as the original issue i have. I have now deactivate my second physical Ethernet. How do i need to change my setup to fix the above? I am using at least 7 different ip adress on dockers, sometimes more. Using the mcvlan feature. (custom br0) And several using the default bridge.
  2. Thx for your response, I will try the bios options. I should note though that it was never an issue until recently and i have had this server running Unraid sinds 2019. Also the post suggests updating the BIOS but i cannot update the the bios on this system. I updated to a P version this was a issues fix/beta bios because of issues with the ipmi kvm and apparently you cannot upgrade from that version. And the motherboard has been end of life for a while now. The memory part of that post also doesn't apply to me, seeing as it passed such a long memtest. My hardware btw: ASRockRack X470D4U American Megatrends Inc., Version P3.30 BIOS dated: Monday, 2019-11-04 CPU: AMD Ryzen 7 3700X 8-Core @ 3600 MHz my ram is 2 sticks of 16GB ECC udimm and from the supported list of the motherboard.
  3. My system froze/hanged several times requiring a forced shutdown. (completely unresponsive to everything (no web-ui, no ssh, no smb shares) but clearly still powered on). I did ping the system but i think that also got me nothing,(not 100% sure its been a few days). The full local logs are lost but the quote above is a part from the log as capture by my log server on a synology. (i tried to format it, hope its readable). This to me seemed like it might be the issue i wanted to include the full logs a s captured over the network by a synology nas. But i think the external version of the log file is too big at 50 MB. Upload fails I did fix the cache file system (XFS) several time because of the dirty shutdowns, i used this guide by Spaceinvader One on Youtube. And did a parity check on the array. Now the external log made it look like it might have been a memory issue to a layman like myself so i ran a memtest. I ran the built in memtest on my 2*16GB ECC udims. It ran for well over a 150 hours, partly because i didn't have the time or energy to keep troubleshooting and ''fixing'' the system so i just left it to its own devices. I forgot to take a picture of the screen but trust me it was a lot of passes with zero errors! On a side note somewhere in the mids of all of this i replaced 2 * 8TB (WD white label chucked) disc 1 data and 1 parity with 2 18Tb discs (WD white label chucked). And sinds then the s.m.ar.t part of the webGUI stopt working, the disc do pas the checks but it doesn't display the data in the Gui properly. I did find a post on the forum about this but none of the fixes worked for me. ( changing the Default SMART controller type, etc i tried all of them) All new discs pass have recently passed short and extended smart at least 2 times, plus the preclear script has been used using the binhex-preclear docker image. Edit: i have added a cut down version (removed older entries) of the externally captured log file: All_2022-2-12-21 28 6 - Copy.csv Logs of the system as it is ''now''": silverstone-diagnostics-20220212-2021_anon.zip
  4. After disabling spin down on all disks and restarting the server it has now been running for 1d and 3 hours without any errors, so i am assuming it is fixed.
  5. Shutdown server, re plugged HBA and all Disks. Then started it up again. After some time new errors The weird thing that stands out to me is that the errors occur after the 2 disk happen to spin down. Could that be related? Also if it is a hardware defect....I don't have a spare HBA, proper size power supply or SAS cable to do any testing (by swapping them out ) so i am at a loss as to how i should handle this right now. Is there anything i can do? silverstone-diagnostics-20210522-1828.zip
  6. Well the errors are back again now on Disk 6 and 7. silverstone-diagnostics-20210522-0008.zip
  7. As far as i can tell, everything seems to be normal and working again. thx again
  8. Thank you for your responses btw! How should i handle the 22 errors that 6 Disks are reporting?
  9. But this issue happened after downgrading from 6.92 back to 6.83 nothing else changed. I also read that some people had compatibility issues with the newer version. I use a: https://www.broadcom.com/products/storage/host-bus-adapters/sas-9300-8i What can i do to fix this? I understand that it is hypothetically possible that my power supply failed or that it is a cable failure but it seems incredibly unlikely to me that this happens at the exact time that that i run into OS issues due to updating and downgrading my OS version. Edit: oke i disconnected and reconnected the HBA and my array is back so maybe it was a badly plugged in connect?
  10. Yes they are, but the Disks them selves are fine according to SMART. This happened after upgrading to 6.9.2 and then downgrading again to 6.8.3. So i am hoping that it is just some limited file inconsistency. And not a mayor failure of hard drives or the whole array.
  11. Oke it got worse i finished the Parity check with no errors and then tried to start the array normally now i have 6 unmountable Disks. That is every Data Disk except Disk 5... Edit: i included new diagnostics silverstone-diagnostics-20210520-2253.zip
  12. Thank you for your response. I have rebooted, Unraid then reported zero errors. Then i started the array in maintenance mode, now doing a Parity check (read only). After that ill try starting the array normally.
  13. As the title says. Screen shot shows the discs, diagnostics logs included. Also: Wen using crusader i cannot access disk 3. /mnt/disk 3 is somehow a file of zero bytes and not a folder silverstone-diagnostics-20210518-2122.zip I put Unraid in maintenance mode and ran a short smart test on Disk 3 it completed with no errors. Disk 3 XFS check with -n ********************************************************************************************************** Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... sb_fdblocks 456165658, counted 458312794 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 6 - agno = 4 - agno = 7 - agno = 5 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... Phase 7 - verify link counts... No modify flag set, skipping filesystem flush and exiting. ********************************************************************************************************** Edit: i added the SMART diagnostics of all 5 disks with errors after running short SMART on all of them. Disk numbers appended( Disk 1 etc). Disk 3 is also running the extensive SMART check atm. WDC_WD80EMAZ-00W_7HKJT7EJ_35000cca257f1e771-20210518-2240 - Disk 3.txt WDC_WD80EMAZ-00W_7HKJWUXJ_35000cca257f1f4f1-20210518-2244 - Disk 4.txt WDC_WD80EZAZ-11T_2SG8U7JJ_35000cca27dc401ba-20210518-2245 Disk 7.txt WDC_WD80EZAZ-11T_2SG9465F_35000cca27dc4271a-20210518-2244 Disk 6.txt WDC_WD80EZAZ-11T_7HJJ6AVF_35000cca257e38cc8-20210518-2243 Disk 1.txt What should i do ? How bad is this?
  14. After recently upgrading to the latest version (6.9.2 ) my server crashed(not immediately, but alter a a time i wasn't using it so don't know exactly wen) i lost the logs so i setup my synology nas to recieve logs from my unraid server in the future. Now only days later the unraid server didn't fully crash, my dockers containers are still running! But unraid cannot be accessed at all. Neither through the webui or WinSCP. Included are the unraid logs as captured by my synology NAS hence the unusual format. All_2021-5-14-23 15 38.html Edit: oops i just noticed the typo in the title, it should be a server error. Edit2: ultimately i did gain access through WinSCP and used shutdown now. Edit3: I couldn't figure out how to fix it and got no responded on this thread so i am simply going to roll back to an older version. Hopefully what ever this is will get fixed.
  15. What I like most, about Unraid is the community. I would love to see a built in file browser.
  16. I just noticed that bond0 is my ipmi interface on my Asrockrack X470D4U mother board: https://www.asrockrack.com/general/productdetail.asp?Model=X470D4U#Download bond0 and eth0 are both listing the exact same numbers . But they do have different mac addresses and the system has 2 cables hooked up. So this is perhaps just related to the impi/kvm, some weird internal traffic?(sorry didn't think about mentioning this sooner) I thought bond0 was some ''unraid thing''. For the record i was not using the ipmi/kvm functions and it still showed the constant 7.2 mbps inbound activity, also the kvm function won't work at the moment for some reason ('' powered off no signal''). And just now it seems the speed dropped again to 30/40 kbps inbound on both bond0 and eth0. Weird...
  17. I had found the help button, just hadden't gotten around to reading up on the cache yet. Thank you for the link ill save it, and dig into all cache information this weekend.
  18. I had not! New to Unraid/command line Linux. Can i safely post those results? I don't really understand what it is saying. Alright, will do that then.
  19. Oh that was purely a test, hench the share name ''disk2 test''. I don't use it. But thanks for the tip! And i didn't realize it was so easy to set syslog server to write to one of your user shares. I changed it thx ! In general my shares are a mess, because i was messing with the settings, seeing what happens. The System is still new with no important data on it (yet). The system disk was set to prefer but i guess it just needs to be ''yes'' . Fixed Yes it was at the same time. The strange thing was that it lasted so long and was so constant. Isn't 7.2 mbps a lot, for incomin traffic if its just the GUI? Also it seems its back again! Do not/Did not have more then 1 tab open.
  20. No discs are being written too. I haven't set up any copy jobs. Nothing on my network is uploading at anything close to those speeds, not even totaled. To the Unraid server or otherwise. And i even blocked the server from accessing the internet on my router. Turned FTP off, stopped all dockers, removed read permissions to shares. Still the same steady incoming 7 ish mbps. I am running the pre-clear plugin, so i would rather not reboot. And i assume that the pre-clear plugin should not cause incoming traffic on an Ethernet port. Maybe i am missing something super obvious and i'm just and idiot. But i don't understand what is going on here. tower-diagnostics-20191205-2223.Anonymized.zip Edit: It stopped After doing this constantly for at least 1 hour and 10 minutes.
  21. Second check completed. No Errors. So can i trust this system with my data ? tower-diagnostics-20191205-1154.Anonymized.zip
  22. So i completed one parity check with the ''Write corrections to parity'' box unchecked. And no errors! Which is good but also scary because i still don't understand where old the errors came from. Should i run it again ? tower-diagnostics-20191204-1405.Anonymized.zip
  23. Thank you, am running the test right now. My memory is running at default settings btw. How will we know, that its ram and not say a SAS cable, HBA or the back plane? Wen i had the 8.900.000 errors, 4 discs where in the right back plane 1 was external via USB. I moved the discs into the left back plane and left the SAS cables ( meaning now connected to a different SAS cable, back plane and port on the HBA and it recovered to zero errors.) I have put the external disc into the right back plane now. And i did get the 500.000 errors on a rebuild after that. So i am worried that it might be the right back plane, SAS cable or that part (port?) of the HBA .
  24. Oke, so i ran memtest86 for almost 16 hours on my single stick of 16 GB ECC unregistered ram. And after almost 16 hours not even a single error. In the same period had the system been doing party, it would have had hundreds of thousands of errors if not millions. Can ram now be ruled out ?
  25. Thank, you very much for your advice! I am running Unraid v6.7.2. btw, forgot to mention that. Currently letting memtest86 run. So far no errors, but i will probably just let it run over night.