BradJ

Members
  • Posts

    80
  • Joined

Everything posted by BradJ

  1. Regarding post #2 in this thread - (From another thread) I noticed this was mentioned in the release notes at one point: https://wiki.unraid.net/Manual/Release_Notes/Unraid_OS_6.10.0#Base_Packages I'm still looking for information regarding post #1 in this thread. Is this error something I should be worried about?
  2. Also, Fix Common Problems says this in the logs: Apr 23 04:30:06 Tower root: Fix Common Problems: Error: Machine Check Events detected on your server Apr 23 04:30:06 Tower root: mcelog: ERROR: AMD Processor family 25: mcelog does not support this processor. Please use the edac_mce_amd module instead. How do I go about using the edac_mce_amd module?
  3. Anything I should do with this error? It literally says "no action required" but I would like to know what might have caused this. Found this in the logs: Apr 19 08:36:47 Tower kernel: mce: [Hardware Error]: Machine check events logged Apr 19 08:36:47 Tower kernel: [Hardware Error]: Corrected error, no action required. Apr 19 08:36:47 Tower kernel: [Hardware Error]: CPU:1 (19:21:2) MC25_STATUS[-|CE|-|-|-|-|-|-|-]: 0x8000000100cbd163 Apr 19 08:36:47 Tower kernel: [Hardware Error]: IPID: 0x0000000000000000 Apr 19 08:36:47 Tower kernel: [Hardware Error]: Bank 25 is reserved. Apr 19 08:36:47 Tower kernel: [Hardware Error]: cache level: L3/GEN, tx: INSN Thank for looking. tower-diagnostics-20230424-0935.zip
  4. There may have been some tension on the SATA cable. I rerouted and reseated the SATA cable. I ran another Scrub and no errors are being reported. I reset the BTRFS stats according to the post you referenced about the BTRFS monitor script. After re-running the script all errors are now 0. The script is now scheduled to run daily to monitor the cache pool. Once again, thank you JorgeB. I would be lost without you.
  5. JorgeB to the rescue again! I ran the script and I have tons of errors: [/dev/sdb1].write_io_errs 0 [/dev/sdb1].read_io_errs 0 [/dev/sdb1].flush_io_errs 0 [/dev/sdb1].corruption_errs 0 [/dev/sdb1].generation_errs 0 [/dev/sdc1].write_io_errs 304324335 [/dev/sdc1].read_io_errs 4301948 [/dev/sdc1].flush_io_errs 2290865 [/dev/sdc1].corruption_errs 14826483 [/dev/sdc1].generation_errs 16809 Do you recommend I replace the cache2 cable and then run another scrub?
  6. Okay, I have recreated the docker image and my docks are all working again. I'm just not sure what the underlying issue is/was. Can anything be determined from the logs?
  7. This is the last thing in the log when I try to start the Docker service: Sep 16 21:06:08 Tower root: mount: /var/lib/docker: wrong fs type, bad option, bad superblock on /dev/loop2, missing codepage or helper program, or other error. Sep 16 21:06:08 Tower kernel: BTRFS error (device loop2): bad tree block start, want 24931565568 have 6449542684566666880 Sep 16 21:06:08 Tower kernel: BTRFS error (device loop2): bad tree block start, want 24931565568 have 12792500698256506278 Sep 16 21:06:08 Tower kernel: BTRFS warning (device loop2): couldn't read tree root Sep 16 21:06:08 Tower kernel: BTRFS error (device loop2): open_ctree failed Sep 16 21:06:08 Tower root: mount error Sep 16 21:06:08 Tower emhttpd: shcmd (651): exit status: 1 Should I recreate the docker image file or is something bigger happening here?
  8. I noticed all my Dockers were not working. Upon trying to restart the Docker service I was getting "Docker service unable to start." I did a reboot to see if that would fix the issue. The diagnostics posted are before the reboot. Upon reboot, Docker failed to start again. I noticed a lot of BTRFS errors on my cache drives in the logs. So I ran a Scrub. No help - tons of errors in the logs. I don't know what to do next. These are relatively new redundant cache drives, just a few weeks old. The first diagnostics was right when I first noticed the problem. The second diagnostics is after the reboot and scrub. Please help! Brad tower-diagnostics-20220916-2012.zip tower-diagnostics-20220916-2055.zip
  9. I tried another controller and it appears to be negotiating at x4! SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] Broadcom / LSI Serial Attached SCSI controller Type: Onboard Controller Current & Maximum Link Speed: 5GT/s width x4 (4 GB/s max throughput) I'm hoping the next parity check is much faster! Thanks again JorgeB!
  10. I'm on the latest BIOS. I have another controller card to try out. Thanks, as always JorgeB. If anyone else has ideas please chime in!
  11. Yes, a Nvidia GTX1060 in the x16 slot (for Plex). I also have 2.5G network card in slot 2. The LSI card is in the third slot which is supposed to be x4 speed according to the manual. Something must be hogging up the lanes. I'm guessing though as my knowledge in "lanes" is very limited. I can put some of my drives on the MB SATA ports as a workaround - but what a waste of a nice LSI card to be limited to 500 MB/s.
  12. My parity checks are taking a long time - about 2.5 days when my largest drives are 14TB. I decided to look into this. I downloaded a docker called Disk Speed to do some benchmarking of the drives. I noticed this: "SAS2308 PCI-Express Fusion-MPT SAS-2 Hewlett Packard Enterprise (Broadcom / LSI) Serial Attached SCSI controller Type: Onboard Controller Current Link Speed: 5GT/s width x1 (500 MB/s max throughput) Maximum Link Speed: 8GT/s width x8 (7.88 GB/s max throughput)" I think I have found out why my array speeds were maxing out at 500 MB's! My MB manual says this card is in a PCIEX 2.0 x4 slot. Why would this card be negotiating a x1 connection instead of something higher? Any advice on how to speed this up? Edit: This is in slot3 of a Gigabyte B450 Aorus M Motherboard Thanks!
  13. Do you know if the size of the cache pool will automatically expand to 2TB when both drives are replaced? Or will the size have to be manually adjusted? I'll be doing this process later this week, I'm just trying to plan ahead. Thanks!
  14. I have two 500GB SSD drives in my redundant cache pool. I want to upgrade them to two 2TB drives. What is the easiest/best method? Replace one at a time and rebuild? Or copy all data off the cache pool and create a new one? Thanks for any input.
  15. One of my tracker websites "outlawed" the latest versions of Deluge: "Due to release stream issues, Versions 2.x Sourced from their PPA (or docker based on PPA) will not work here" What does this mean? More importantly, whatever the issue is, will it get fixed?
  16. I just got a new 14TB external that I would like to preclear while in the enclosure. I plugged it in and did not mount the drive. I deleted the NTFS partition to start fresh. When I click on start to preclear it just gets stuck on "Starting...". No read or write activity and I don't see anything in the log. I have precleared many drives over the years and never had a problem. How should I troubleshoot? UPDATE: Reboot of the server fixed the issue.
  17. My kid wants his minecraft world reset to start over. I'm not sure if it's a docker question or a minecraft console question. How would I go about doing that? I apologize if I am asking in the wrong area. Sorry I don't play the game at all.
  18. https://www.amazon.com/gp/product/B07Y2GWVB8 These work in 6.8.3 natively.
  19. I have now replaced the drive giving read errors and everything seems to be working correctly again. Moral of the story: do not buy white label hard drives from goharddrive (at least for a NAS). In total, I have had problems with 3 out of 4 WL drives. One was replaced under warranty with an enterprise level WL drive and that one works fine. The other ones started having weird intermittent errors right after the one year warranty. Thank you to all that has helped me thru this, I appreciate it. I will now mark this topic as solved.
  20. Lol, yes - It's a while label drive purchased from GoHardDrive. This is the second WL drive from there that unraid has found problems with. I'll be replacing it with a 10TB WD shucked drive (yes, I'm cheap).... I'll report back once the array is rebuilt and a parity check has completed successfully.
  21. Update: Suspect Drive passed the SMART extended test. I then started a non-correcting parity check - successful with no errors. Then it got interesting. I started a correcting parity check and immediately got 134 read errors. It is my belief something is just wrong with the drive. What exactly - who knows - but something is wrong with it. Where to go from here is the question now. I'm thinking of just cancelling the current parity check and replacing the drive. Is that a good game plan or is there anything else I should consider? tower-diagnostics-20201007-1320.zip
  22. This is why I asked about this in the OP. It's a new drive (to this array) so I can not be 100% certain on the quality of this drive. Alright, just to be safe, I will run a non-correcting check first and see what happens. I'll report back on the results.
  23. That would make sense. The Wiki doesn't specifically say the errors will be zeroed out. I guess I made that conclusion in my head after I was reading about how READ errors are fixed automatically. I guess I just assumed the next parity check would zero out the errors. I will run a correcting parity check and report back (in 26+ hours).
  24. Ok, that's good advice and I will run a parity check asap. Again, any insight if the parity check should be a correcting one or not?
  25. I run a monthly parity check. The new drive has not been thru a parity check since I just added it a few days ago. Yes, all of my previous parity checks have had 0 errors. I will start a parity check as soon as the extended smart check is finished. According to the Wiki a (successful) parity check should zero out the errors as well. I'm still not sure about the parity check being a correcting one or not.