Jump to content

Husker_N7242C

Members
  • Posts

    63
  • Joined

  • Last visited

Posts posted by Husker_N7242C

  1. Hi everyone, I'm at a bit of a loss - every couple of days the server goes offline and is not responsive (including GUI and webGUI). Sometimes a disk is randomly disabled but once added back again, often doesn't get disabled again, but another disk gets disabled a few days later.

    I've replaced all SATA and power cables and swapped the drives around to try to find a pattern, but can't see one.

    Someone suggested that 8TB Seagate Ironwolf drives have an issue with spinning down, so I disabled spin-down which didn't help.

    I've tried running it with all dockers disabled and just one VM running (no hardware passthrough) and still get hangs.

    I'm not sure if diagnostics is any help as I think it clears each hang but I've attached just in case.

    Any suggestions of what to try next?? 

    Much appreciated guys!
     

    Hardware basics:

    AsRock x79 Extreme 11 with E5-2670

    SATA controller: Intel Corporation C600/X79 series chipset 6-Port SATA AHCI Controller (rev 06)

    Serial Attached SCSI controller: Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)

    Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)

    32GB DDR3 1333MHz ECC Memory (4x 8GB)

    7x 8TB Seagate Ironwolf ST8000VN04-2M2101 (Parity+6 data)

    2x 3TB Seagate Barracuda ST3000DM007-1WY10G (data)

    2x 2TB Seagate NAS ST2000VN000-1HJ164 (data)

    1x 2TB Seagate Barracuda ST2000DM006-2DM164 (data)

    Plus 3x Cache pools with 7 SSDs total (mix of Samsung and Crucial/Micron)

    diagnostics-20220417-1647.zip

  2. Hi everyone,

     

    The Fix Common Problems plugin says the following:
    Machine Check Events detected on your server - Your server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the unRaid forums. The output of mcelog (if installed) has been logged

     

    It doesn't give me any hint on how to look into the issue myself and I have MCELOG already running so attached is my diagnostics. I would appreciate it if anyone can give me some pointers?

     

    Background that might be related (or might not):

    I recently removed a GPU and replaced it if that helps. I had an original GTX Titan for transcoding for the past couple of years however it seemed to be crashing the whole system during transcodes so I have removed it and I am waiting on a GTX 1660 Super to arrive. In the mean time I have an old GPU for GUI output only and relying on the CPU for PLEX transcoding.

     

    Thanks in advance

    nas-diagnostics-20210627-1439.zip

  3. I've ended up running the pre-clear script on both drives with pre and post read cycles. Both passed no errors. I've added disk2 back to the array and copied 2TB of data back to it without an error. I'll add the parity back tomorrow and rebuild.

    I saw a post in the Facebook group of (what looks like) the same thing happening to others. Parity gets disabled, nobody can work out why, they check the disk and put it back and all is good. Maybe it is a bug or UNRAID is disabling the drive too ruthlessly? 

  4. Thanks again Johnnie (sorry I wrote tee-tee earlier, I was on my mobile and mis-read).

    I attempted another parity sync but it failed and disabled the parity drive again (new diag attached).

     

    Re: checksums, I do have Dynamix File Integrity Plugin installed. It runs monthly with SHA2. I have no clue how to use it to help my situation? I've attached a screenshot as it shows that disk2 "build" and "export" are not up to date

    File Integrity.PNG

    nas-diagnostics-20200813-1709.zip

  5. Thanks tee-tee. I really appreciate the reply.  I've rebooted and this time it has let me put the parity drive back and is trying a sync.
    I don't think that disk2 can possibly have rebuilt correctly though. The rebuild failed after like 2 hours.
    If this parity sync finished I'm not sure I can trust that I don't have a corrupt allocation table or something on disk2.

    I've attached a fresh diagnostics in the hopes that it will now contain something useful?

    nas-diagnostics-20200812-1906.zip

  6. Hi guys, I've had a really unusual set of circumstances result in what appears like multiple drive failures (but isn't). I've probably lost 2TB of data, I'm at risk of loosing 8TB more and have no partiy now. Please help if you can (diagnostics attached)

    Order of events:

    1. disk2 was disabled by UNRAID for smart errors (7 year old 2TB)

    2. Replaced disk2 with new Ironwolf 8TB

    3. rebuild failed for some reason and the parity disk was disabled by UNRAID. The parity is a quite new 8TB ironwolf also with no SMART errors to date.
    4. I tried to re-seat the cables and reboot just incase something came loose

    5. After reboot disk2 appears to be rebuilt which is impossible in the short time it was rebuilding

    6. Parity drive is still disabled and stopping the array and trying to remove and re-add it doesn't work, UNRAID doesn't want it.

    7. I tried removing disk2 in the GUI because it is obviously corrupt. UNRAID tries putting data on it but the data just vanishes into oblivion.

    8. Ran a read check on all drives hoping that UNRAID would see that disk2 is corrupt and let me do something with it but it got 0 errors from disk2

    9. disk4 got 120,000 read errors from the test.... another near new ironwolf 8TB with no smart errors which was fine until now.

     

    I don't even know where to start with this. I realise that 2TB from disk2 is probably gone. I can live with that I guess. I really don't want to loose another 8TB from disk4.

    Any help would be received gratefully.

    MAIN_screenshot.png

    nas-diagnostics-20200812-1631.zip

  7. Hi guys, as the title says, I had an RX570 passed through to a VM OK for the past few months (except it won't reset when you reboot the VM) and following a power failure it can't be allocated to a VM (isn't an option in the drop-down, just VNC). The RX570 DOES appear in Devices (as below) so I'm a bit miffed.

    [1002:67df]09:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev ef)

    [1002:aaf0]09:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590]

     

    The GTX Titan doesn't get used for VMs, that just does the GUI output and renders video for PLEX docker (Hardware Encoding).

     

    Logs are attached so any advice would be appreciated. I've got trouble with USB devices also so if you see/know anything I can do to get USB passthrough working well with this hardware please let me know, I'd really appreciate it.

     

    nas-diagnostics-20200209-0822.zip

  8. I'm running v6.7.0. It takes a while to open the VM tab because it seems to cause many disks to spin-up.

    What is the cause and is there a way to avoid this/reduce it?

    It doesn't seem to matter if there are VMs running or not.

    I have a few VMs that I rarely use with vdisks on the array but the main 2 VMs that are always running are on the cache.

    Thanks guys.

  9. Guys, my cache pool was unbalanced which was causing it to pause the VMs for no reason. I ran the btrfs balance start -dusage=50 /mnt/cache from terminal which completed successfully then I tried to run btrfs balance 75 but it gave an error that the cache was read only. The VMs and docker service crashed shortly after and I'm stuck.

     

    I ran docker safe permissions but it didn't help.

     

    I'd appreciate some advice. Thanks!

    nas-diagnostics-20190614-0801.zip

×
×
  • Create New...