Jump to content

JorgeB

Moderators
  • Posts

    67,797
  • Joined

  • Last visited

  • Days Won

    708

Everything posted by JorgeB

  1. That look more like a hardware problem.
  2. Assuming there's nothing more before or after that in the syslog can't really say what that's about.
  3. Power cycle the server, rebooting might not be enough, if it still doesn't come back switch slots with another device, if still no it's likely dead.
  4. Please post diags when the IPs are correct and also from when they switch, you can type diagnostics in the console if you lose access to the GUI.
  5. Cache device dropped offline: Jun 23 07:43:03 ForwardUntoDawn kernel: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x100) Jun 23 07:43:03 ForwardUntoDawn kernel: ata2.00: revalidation failed (errno=-5) Jun 23 07:43:09 ForwardUntoDawn kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Jun 23 07:43:09 ForwardUntoDawn kernel: ata2.00: failed to IDENTIFY (I/O error, err_mask=0x100) Jun 23 07:43:09 ForwardUntoDawn kernel: ata2.00: revalidation failed (errno=-5) Jun 23 07:43:09 ForwardUntoDawn kernel: ata2.00: disabled Then it came back with another identifier, suggest you shut down, check/replace cables, power back up and the device should be correctly assigned, if yes just start the array, if not start it without that device assigned, then stop, re-assign it and start again.
  6. You can clear the array by doing a new config (Tools -> New config)
  7. Monthly scrub is a good idea, but much more important is to monitor the pool for any errors since the GUI currently doesn't show that.
  8. Yes, as long as no RAID controllers are involved, or you sue the same one.
  9. You don't need to fully disable c-states, just enable the correct power supply control setting, unless it doesn't exist in your board BIOS.
  10. That's form mcelog, not an issue. Screens are showing some ATA issues, first ones shows issues with ATA1, on the second one link is down, check all cabling, including power.
  11. Yes because there's redundancy, but you should bring the dropped device online and run a scrub to sync the pool, also good to monitor to cache any future issues, more info here.
  12. Yep, that's the one, it dropped offline: Jun 24 04:50:09 Tower kernel: nvme nvme0: I/O 170 QID 2 timeout, aborting Jun 24 04:50:09 Tower kernel: nvme nvme0: I/O 171 QID 2 timeout, aborting Jun 24 04:50:09 Tower kernel: nvme nvme0: I/O 172 QID 2 timeout, aborting Jun 24 04:50:09 Tower kernel: nvme nvme0: I/O 173 QID 2 timeout, aborting Jun 24 04:50:09 Tower kernel: nvme nvme0: I/O 174 QID 2 timeout, aborting Jun 24 04:50:37 Tower kernel: nvme nvme0: I/O 12 QID 0 timeout, reset controller Jun 24 04:50:39 Tower kernel: nvme nvme0: I/O 170 QID 2 timeout, reset controller Jun 24 04:53:40 Tower kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1 The below can sometimes help, if not try a different PCIe/m.2 slot if available, or a different model device. Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on flash, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (on the top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 Reboot and see if it makes a difference.
  13. It's a problem with that device and newer kernels: https://us.community.samsung.com/t5/Monitors-and-Memory/SSD-980-heat-spikes-to-84-C-183-F/td-p/2002779
  14. Very strange, just yesterday installed Win Server 2019 and it installed the driver without any issues.
  15. HBA is being detected and initialized correctly, main suspect would be the cables, do you know the exact cables you're using?
  16. Normalized values look good, RAW value probably not the main indicator for this device, I wouldn't worry for now.
  17. Parity is invalid, autostart will start to work once it's synced.
  18. Crashing without nothing logged suggests a hardware problem, one thing you can try is to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.
  19. Yes, as long as all pool members are unassigned it's fine.
  20. You just need to unassign them all, them when the disks are re-formatted xfs (or btrsf encrypted) just re-assign all the pool devices and start the array, it will import the existing pool.
  21. If the disks are still empty this would be the easiest way to solve this issue, or if you really want to use btrfs change them to btrfs encrypted.
  22. I assume this was old disk5? Model Family: Western Digital Red Device Model: WDC WD40EFRX-68N32N0 Serial Number: WD-WCC7K6DJD90S If yes it's showing some SMART issues, so you should run an extended SMART test, before or after you can also mount the disk with the UD plugin to confirm it's mounting and contents look correct.
  23. Don't try fixing the filesystem, that's not the problem, pool should be OK, problem is parity is registering as an invalid btrfs filesyetem during device scan: Jun 23 10:35:47 Cold1 emhttpd: shcmd (236): /sbin/btrfs device scan Jun 23 10:35:48 Cold1 root: ERROR: cannot scan /dev/sdi1: Input/output error This is why the pool doesn't mount, this is a known issue but usually only happens when there's a single btrfs array device, though, and because of how parity it works, it can happen when there's an even number of btrfs array devices, you have 5. Any recent changes to the array, like did you add a new disk?
  24. Filesystem Size Used Avail Use% Mounted on /dev/md5 3.7T 26G 3.7T 1% /mnt/disk5 As expected emulated disk5 is now empty since it was formatted, best bet is to connect the old disk5, don't assign it anywhere, just post new diags so we can check SMART.
×
×
  • Create New...