Jump to content

JorgeB

Moderators
  • Posts

    67,696
  • Joined

  • Last visited

  • Days Won

    708

Everything posted by JorgeB

  1. Looks like a controller problem: Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: midlevel-0 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: lowlevel-0 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: error handler-48 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: firmware-33 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: outstanding cmd: kernel-0 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: Controller reset type is 3 Oct 21 12:48:28 tobor-server kernel: aacraid 0000:01:00.0: Issuing IOP reset Oct 21 12:49:49 tobor-server kernel: aacraid 0000:01:00.0: IOP reset failed Oct 21 12:49:49 tobor-server kernel: aacraid 0000:01:00.0: ARC Reset attempt failed If possible use one the recommended controllers, like an LSI HBA, you also have filesystem corruption on multiple disks.
  2. See if you get the diags or at least the syslog using the console, if not you'll need to force a reboot.
  3. If it's still crashing in maintenance mode then there are other issues, and filesystem corruption is a result of the crashing, not the reason.
  4. So then you'dneed to set the VM timeout to like 5 minutes and set the general timeout to 6 minutes, or just shutdown the VMs manually before rebooting/stopping the array, that's what I do.
  5. You should always start a new thread, but since you're here, there were simultaneous errors on 5 disks, disks 1 through 5, disk2 got disabled because it was the first to give write error, it could happen to any of the 5, this is usually a power/connection problem.
  6. Sorry, misread, yes that's the latest one, and it's whatever is on the LSI site.
  7. This is a SAS1 backplane, half bandwidth of SAS2, will also likely have issues with drives > 2.2TB, fill up the front backplane first, then connect both cables from one HBA to the back baclplane to check if it supports dual link, you can confirm with the output of: cat /sys/class/sas_host/host#/device/port-#\:0/sas_port/port-#\:0/num_phys Replace all the #s with the correct host number, if you don't known post the diags, if the output is 4 it means single link, 8 means dual link.
  8. Unraid driver is constantly crashing during the parity check, this happens with some hardware, best bet is to try v6.10-rc1, newer kernel might help.
  9. Doesn't look like a disk problem, you can run an extended SMART test to confirm.
  10. You need to increase the general timeout, they can't be the same or the second one will kick in, still it looks like 180 wasn't enough for the VMs to shutdown, do they shutdown if you stop the array? If yes post new diags after doing that.
  11. It's a dual expander model, primary ports go to one expander, secondary ports to the other, with Unraid you only need to use the primary. No, for best performance you can connect one HBA using dual link to the front backplane, Supermicro recommends using primary ports J1 and J2, but it should work with any two ports on the same expander, also the 9211 with dual link will bottleneck due to being PCIe 2.0, a PCIe 3.0 HBA like the 9207-8i or 9300-8i would be better, assuming the board/CPU supports PCIe 3.0. Not sure the seconds backplane supports dual link, you can test or post the model for that one to see if I can find any info, though with half the drives it will have the same approximate bandwidth with a single link as the front one with dual, and Unraid max array size is 30 devices, so you'll never need to use the 36 devices simultaneously.
  12. Connect the disk to the onboard SATA controller and check if it's detected in the BIOS.
  13. It's logged as a disk problem, wait for the extended test to finish and act according to the result.
  14. To see the cache share you need to enable disk shares, but keep in mind that you can't copy/move files from disk shares to user shares or vice versa, or risk losing data.
  15. This just means that cache is hitting the minimum space set for that share.
  16. SMART overall-health self-assessment is mostly meaningless, important part is that the extended test failed: SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 41827 312950152 Disk needs to be replaced.
×
×
  • Create New...