Jump to content

JorgeB

Moderators
  • Posts

    67,386
  • Joined

  • Last visited

  • Days Won

    705

Everything posted by JorgeB

  1. Disk looks perfectly healthy, please post the complete diags: Tools -> Diagnostics
  2. Like mentioned if you have Windows 10 you can copy with explorer from the array to the UD device directly, it won't use the network, just copy/paste normally.
  3. Simply overwrite them with the ones form the v6.7.2 zip and reboot.
  4. Both parity and disk1 are failing, so you can't do a standard replacement, are system notifications enable? Not usual for two disks to fail at the same time, especially on such a small array. IMHO best bet is to use ddrescue to clone disk1 and then do a new config with a new parity disk. P.S. you should also run an extended SMART test on disk3, it's showing some issues on SMART, though they appear to be old errors.
  5. Diags are just after rebooting, see here to try and catch the problem.
  6. Unraid can't spin down SAS devices, and every attempt generates the error spamming the log, disable spin down for all SAS devices (set to "never").
  7. Likely a flash drive problem, backup flash's config folder, redo flash drive and restore config folder.
  8. You can also try downgrading Unraid, different kernel, older or newer, might make a difference, if still the same with a release that was working previously there could be something wrong with the NIC.
  9. If you read above you can see we already found out what's causing the problem, Intel NIC using the igb driver, doesn't mean that everyone using it will crash, but AFAIK everyone that is having issues is using one.
  10. First NIC isn't initializing correctly, second one is: Jan 23 14:59:37 catan kernel: e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k Jan 23 14:59:37 catan kernel: e1000e: Copyright(c) 1999 - 2015 Intel Corporation. Jan 23 14:59:37 catan kernel: e1000e 0000:04:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode Jan 23 14:59:37 catan kernel: e1000e: probe of 0000:04:00.0 failed with error -2 Jan 23 14:59:37 catan kernel: e1000e 0000:04:00.1: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode Jan 23 14:59:37 catan kernel: e1000e 0000:04:00.1 eth0: (PCI Express:2.5GT/s:Width x4) 00:30:48:62:4e:f1 Jan 23 14:59:37 catan kernel: e1000e 0000:04:00.1 eth0: Intel(R) PRO/1000 Network Connection Jan 23 14:59:37 catan kernel: e1000e 0000:04:00.1 eth0: MAC: 5, PHY: 5, PBA No: 2050FF-0FF No idea on the reason, try updating to v6.8.1, also look for a BIOS update if available.
  11. Please post the diagnostics: Tools -> Diagnostics
  12. It is, assuming you're copying large file(s), check that write cache is enable on the disk, also might try a different copy program, if you're using Windows8/10 you can copy with Windows explorer and copy will be made locally bypassing network.
  13. Any LSI with a SAS2008/2308/3008/3408 chipset in IT mode, e.g., 9201-8i, 9211-8i, 9207-8i, 9300-8i, 9400-8i, etc and clones, like the Dell H200/H310 and IBM M1015, these latter ones need to be crossflashed. There are also similar models with external ports if needed, e.g. 9200-8e, 9207-4i4e, etc
  14. Yes, both are possible, if you still have the original disk2, current disk2 is totally corrupt since there were errors on more disks than parity could handle during the rebuild, disk8 should be fine, it wasn't mounting because it was disable and Unraid can't emulated correctly since current disk2 is corrupt, but if you re-enable it or mount it with UD it should mount correctly.
  15. You swapped them with another ones, it's the easiest way and see if the problem goes away, could also be a PSU problem, unlikely to be HBAs since there are 2 of them and are using the latest firmware.
  16. NVMe cache device stopped responding: Jan 17 14:26:44 Tower kernel: nvme nvme0: I/O 991 QID 13 timeout, aborting Jan 17 14:26:44 Tower kernel: nvme nvme0: I/O 992 QID 13 timeout, aborting Jan 17 14:26:44 Tower kernel: nvme nvme0: I/O 993 QID 13 timeout, aborting Jan 17 14:26:44 Tower kernel: nvme nvme0: I/O 994 QID 13 timeout, aborting Jan 17 14:26:45 Tower kernel: nvme nvme0: I/O 827 QID 5 timeout, aborting Jan 17 14:26:45 Tower kernel: nvme nvme0: I/O 406 QID 10 timeout, aborting Jan 17 14:26:45 Tower kernel: nvme nvme0: I/O 407 QID 10 timeout, aborting Jan 17 14:26:45 Tower kernel: nvme nvme0: I/O 408 QID 10 timeout, aborting Jan 17 14:27:14 Tower kernel: nvme nvme0: I/O 991 QID 13 timeout, reset controller Jan 17 14:27:45 Tower kernel: nvme nvme0: I/O 8 QID 0 timeout, reset controller Jan 17 14:28:48 Tower kernel: nvme nvme0: Device not ready; aborting reset Jan 17 14:28:48 Tower kernel: nvme nvme0: Abort status: 0x7 ### [PREVIOUS LINE REPEATED 7 TIMES] ### Jan 17 14:29:18 Tower kernel: nvme nvme0: Device not ready; aborting reset Jan 17 14:29:18 Tower kernel: nvme nvme0: Removing after probe failure status: -19 Jan 17 14:29:18 Tower kernel: nvme nvme0: Device not ready; aborting reset Jan 17 14:29:18 Tower kernel: print_req_error: I/O error, dev nvme0n1, sector 500361306 Reboot or power/cycle to see if comes back online.
  17. Another one using the igb NIC driver, that's my main suspect.
  18. No, you just need to have something powering up the PSU, you can use a clip or if you like Supermicro makes a little board for that purpose, the one on top:
  19. OK, that one has a builtin expander, so you just need to connect the HBA on the main server directly to the backplane on the other box, one or two cables for single/dual link, since it's a SAS -> SAS connection cables can be up to 10meters long, but use as small as possible.
  20. Looking at the various diags/board models used I can only see one thing in common, Intel gigabit NIC, anyone having this issue not using an Intel NIC that loads the igb driver? Alternatively, and I can't do it now, but if anyone else affected try disabling the Intel NIC on your BIOS and see if the crash still happens.
  21. Thanks, so we can rule out LSI, I also don't think it's plugin related, strange thing is happening with very different hardware.
  22. I didn't test myself, though I have very few plugins installed on that server, but according to at least one user in one of the threads I linked above it also happens in safe mode without any plugins.
×
×
  • Create New...