Jump to content

JorgeB

Moderators
  • Posts

    67,858
  • Joined

  • Last visited

  • Days Won

    708

Everything posted by JorgeB

  1. Now stop the array, re-assign disk3 and start the array to begin rebuilding, during the rebuild monitor the log for any disk related errors, similar to these: Sep 22 14:12:46 wowserver02 kernel: sd 18:0:4:0: attempting task abort!scmd(0x00000000b2e31524), outstanding for 15103 ms & timeout 15000 ms Sep 22 14:12:46 wowserver02 kernel: sd 18:0:4:0: [sdm] tag#7355 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00 Sep 22 14:12:46 wowserver02 kernel: scsi target18:0:4: handle(0x000d), sas_address(0x4433221105000000), phy(5) Sep 22 14:12:46 wowserver02 kernel: scsi target18:0:4: enclosure logical id(0x500605b00b101dd0), slot(6) Sep 22 14:12:46 wowserver02 kernel: scsi target18:0:4: enclosure level(0x0000), connector name( ) Sep 22 14:12:48 wowserver02 kernel: sd 18:0:4:0: [sdm] tag#3975 UNKNOWN(0x2003) Result: hostbyte=0x0b driverbyte=DRIVER_OK cmd_age=18s Sep 22 14:12:48 wowserver02 kernel: sd 18:0:4:0: [sdm] tag#3975 CDB: opcode=0x88 88 00 00 00 00 03 a3 81 2a 78 00 00 00 08 00 00 Sep 22 14:12:48 wowserver02 kernel: I/O error, dev sdm, sector 15628053112 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 Sep 22 14:12:48 wowserver02 kernel: sd 18:0:4:0: task abort: SUCCESS scmd(0x00000000b2e31524) Sep 22 14:12:49 wowserver02 kernel: sd 18:0:4:0: Power-on or device reset occurred If you see them there are still issues, probably power/connection related.
  2. I would try running with just one DIMM at a time, that would basically rule that out if it keeps crashing with either one, then next suspect would be the board/CPU.
  3. Run xfs_repair again without -n, and if it asks for -L use it.
  4. Nothing relevant I can see, this usually suggests a hardware problem, one more thing you can try is to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.
  5. A pool with one or more SSDs, or just a single fast NVMe device. You are virtualizing Unraid, depending on how you are doing it might have a performance impact, or there is a controller bottleneck, post the output of: lspci -d 1000: -vv
  6. Pretty sure it doesn't, it only fixes the shares under /mnt/user, not /mnt/user.
  7. You need to rebuild it, first start the array with the missing disk to see if the emulated disk is mounting and contents look correct. P.S. seeing some issues with disk1 in the last diags, looks like power/connection issue.
  8. Forgot to mention, you must run without -n or nothing will be done.
  9. Strange, there's clearly more data on disk1, check filesystem on disk1 to see if there's any change.
  10. Go to Shares, click on "compute all" and please post a screenshot of the results.
  11. I'm starting to think the same, worth checking when it happens, the big clue here was that the flash drive share was still accessible, and that's obviously not under /mnt/user, so it reminded of a previous case where the permissions were incorrect, it was with v6.9 though.
  12. Not sure how the usage could be of by that much, df is reporting the same, what kind of files do you have there? Media type or more compressible files?
  13. Possibly, though some times detecting it and others not is strange.
  14. NIC problems: Sep 22 02:16:58 Tower kernel: e1000 0000:03:01.0 eth0: Detected Tx Unit Hang Sep 22 02:16:58 Tower kernel: Tx Queue <0> Sep 22 02:16:58 Tower kernel: TDH <1000001> Sep 22 02:16:58 Tower kernel: TDT <1000001> Sep 22 02:16:58 Tower kernel: next_to_use <1> Sep 22 02:16:58 Tower kernel: next_to_clean <0> Sep 22 02:16:58 Tower kernel: buffer_info[next_to_clean] Sep 22 02:16:58 Tower kernel: time_stamp <fffbd99a> Sep 22 02:16:58 Tower kernel: next_to_watch <0> Sep 22 02:16:58 Tower kernel: jiffies <fffbe940> Sep 22 02:16:58 Tower kernel: next_to_watch.status <0> Sep 22 02:17:00 Tower kernel: e1000 0000:03:01.0 eth0: Detected Tx Unit Hang Sep 22 02:17:00 Tower kernel: Tx Queue <0> Sep 22 02:17:00 Tower kernel: TDH <1000001> Sep 22 02:17:00 Tower kernel: TDT <1000001> Sep 22 02:17:00 Tower kernel: next_to_use <1> Sep 22 02:17:00 Tower kernel: next_to_clean <0> Sep 22 02:17:00 Tower kernel: buffer_info[next_to_clean] Sep 22 02:17:00 Tower kernel: time_stamp <fffbd99a> Sep 22 02:17:00 Tower kernel: next_to_watch <0> Sep 22 02:17:00 Tower kernel: jiffies <fffbf140> Sep 22 02:17:00 Tower kernel: next_to_watch.status <0> Sep 22 02:17:01 Tower dhcpcd[1478]: br0: probing for an IPv4LL address Sep 22 02:17:02 Tower kernel: e1000 0000:03:01.0 eth0: Detected Tx Unit Hang Sep 22 02:17:02 Tower kernel: Tx Queue <0> Sep 22 02:17:02 Tower kernel: TDH <1000001> Sep 22 02:17:02 Tower kernel: TDT <1000001> Sep 22 02:17:02 Tower kernel: next_to_use <1> Sep 22 02:17:02 Tower kernel: next_to_clean <0> Sep 22 02:17:02 Tower kernel: buffer_info[next_to_clean] Sep 22 02:17:02 Tower kernel: time_stamp <fffbd99a> Sep 22 02:17:02 Tower kernel: next_to_watch <0> Sep 22 02:17:02 Tower kernel: jiffies <fffbf940> Sep 22 02:17:02 Tower kernel: next_to_watch.status <0> Sep 22 02:17:04 Tower kernel: ------------[ cut here ]------------ Sep 22 02:17:04 Tower kernel: NETDEV WATCHDOG: eth0 (e1000): transmit queue 0 timed out Do you have a different one you could try with?
  15. Yes, not sure exactly why it happens, and possibly more with btrfs image type files can grow with time and report less space than they are actually using, this does not happen, or at least, it's much less obvious, if the images are regularly trimmed/unmapped. One way to confirm would be to move the images outside the pool, then move them back using cp --sparse=always, there are also reports that defragging the filesystem helps, as long as you don't use snapshots, in that case defragging would be a bad idea.
  16. Unraid is not RAID, you'll never get same speeds as other solutions that stripe drives, you can have fast pools, array will always be limited by single disk speed.
  17. Remove all RAM from CPU0 and use half that RAM on the other CPU for that one, if it boots start adding the removed DIMMs.
  18. Disabled and unmountable are two different things, were is the old disk9? It's not in the diags posted.
  19. Syslog in the diags starts over after every boot, enable the syslog server and post that if it crashes again.
  20. Looks more like a power/connection issue: Sep 21 16:26:33 NetPlex kernel: pm80xx0:: mpi_sata_completion 2455:IO failed device_id 18439 status 0xf tag 95 Sep 21 16:26:33 NetPlex kernel: pm80xx0:: mpi_sata_completion 2490:SAS Address of IO Failure Drive:50000d1109b27a9e Sep 21 16:26:33 NetPlex kernel: sas: sas_to_ata_err: Saw error 135. What to do? But not familiar with these controllers and their errors.
  21. Since disk4 is disabled unassign it, start the array and post new diags.
×
×
  • Create New...