Jump to content

JorgeB

Moderators
  • Posts

    67,647
  • Joined

  • Last visited

  • Days Won

    707

Everything posted by JorgeB

  1. Looks like the unit has dual controllers, Unraid doesn't support SAS multipath, try connecting just one cable from the HBA to the enclosure. P.S. You're using firmware 20.00.02.00 on the LSIs, they should be updated to 20.00.07.00 as any p20 firmware except the latest has known issues.
  2. You just need to re-assign all drives to the original positions, then check "parity is already valid" before array start. Sorry, misread as "I do know the correct order", try to mount them individually with the UD plugin in read only mode, if you only had one parity the drive that doesn't mount is parity, if you had dual parity post below.
  3. Cache pool is very damaged due to errors on one device: May 20 19:01:31 Platter kernel: BTRFS info (device sdl1): bdev /dev/sdl1 errs: wr 58001047, rd 56786378, flush 32039, corrupt 232381, gen 0 See here for better pool monitoring, there are also some recovery options here if needed, but pool should be recreated in any case.
  4. Diags are after rebooting, if it happens again post diags before rebooting.
  5. Read errors on multiple disks: May 20 22:48:29 Silverstone kernel: md: disk4 read error, sector=8 May 20 22:48:29 Silverstone kernel: md: disk4 read error, sector=16 May 20 22:48:29 Silverstone kernel: md: disk4 read error, sector=24 May 20 22:48:29 Silverstone kernel: md: disk7 read error, sector=8 May 20 22:48:29 Silverstone kernel: md: disk7 read error, sector=16 May 20 22:48:29 Silverstone kernel: md: disk7 read error, sector=24 May 20 22:48:29 Silverstone kernel: md: disk6 read error, sector=8 May 20 22:48:29 Silverstone kernel: md: disk6 read error, sector=16 May 20 22:48:29 Silverstone kernel: md: disk6 read error, sector=24 May 20 22:48:29 Silverstone kernel: Buffer I/O error on dev md1, logical block 0, async page read ### [PREVIOUS LINE REPEATED 1 TIMES] ### May 20 22:48:29 Silverstone kernel: md: disk1 read error, sector=32 May 20 22:48:29 Silverstone kernel: md: disk1 read error, sector=40 May 20 22:48:29 Silverstone kernel: md: disk1 read error, sector=48 This is a likely a power, connection or controller problem.
  6. Likely won't make any difference, and should have a backup of the image itself, not just the XMLs.
  7. I can see this: May 15 07:39:08 Tower emhttpd: shcmd (6380905): /etc/rc.d/rc.libvirt stop May 15 07:39:08 Tower root: error: failed to connect to the hypervisor May 15 07:39:08 Tower root: error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused May 15 07:39:08 Tower root: Waiting on VMs to shutdownerror: failed to connect to the hypervisor May 15 07:39:08 Tower root: error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused May 15 07:39:08 Tower root: May 15 07:39:08 Tower root: error: failed to connect to the hypervisor May 15 07:39:08 Tower root: error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused May 15 07:39:08 Tower root: Stopping libvirtd... May 15 07:39:09 Tower root: error: failed to connect to the hypervisor May 15 07:39:09 Tower root: error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': Connection refused May 15 07:39:09 Tower root: /etc/rc.d/rc.libvirt: line 157: kill: (23854) - No such process May 15 07:39:12 Tower root: Stopping virtlogd... May 15 07:39:13 Tower root: Stopping virtlockd... May 15 07:39:14 Tower emhttpd: shcmd (6380906): umount /etc/libvirt May 15 07:39:14 Tower root: umount: /etc/libvirt: target is busy. VM service failed to stop and unmount libvirt, so it's normal for a reboot to fix it, can't say why it failed to stop though.
  8. Don't see any errors logged, but load average according to top is crazy high, one thing you can try is starting the dockers one at a time then wait a few hours before starting the next one to see if you find a culprit.
  9. This is very strange, and I don't see nothing in the logs about libvirt issues, the only thing logged about that are the scheduled trims.
  10. Correct, but the fact that there was corruption on cache and sync errors points to a hardware issue, hence why I recommended running memtest, you can also run another check, if it finds more errors there's still a problem.
  11. Well, that's kind of obvious since that's what the setting does, any transfer to a share with cache=yes will look at minimum free space for cache and minimum free space for that share, whichever is highest is the one that will take effect for any transfer to that share.
  12. Look in the syslog, it will identify the corrupt file then delete/restore form backups, or it will reference metadata if that's what corrupt.
  13. It was the same, and as far as I remember, it has always been like that.
  14. Btrfs is also detecting data corruption on the cache pool, you should run memtest (and a scrub on cache).
  15. Didn't you see the warning in red letters that's it's broken in v6.9.x? You just need to stop array, unassign both cache devices, start array so Unraid "forgets" cache assignments, re-assign both the OLD (original) cache devices, start array and and the original pool should mount.
  16. The type and frequency of the ATA errors.
  17. Enable this then post that log after a crash.
  18. I guess you mean the disks as the title mentions :), I can't see that in the log, but it's normal for the webGUI to not function correctly after the flash drops.
  19. Even with an unclean shutdown that shouldn't happen. Yes, and assign a different drive to the same slot, than start array to rebuild.
  20. Likely nothing to worry about, but they appear limited to the Seagate 8TB drives, and while it's not the affected model this might still help:
  21. Not really, test with another source PC if available.
  22. There's a crash when detecting the device(s) on that controller, looks like some hardware compatibility issue, looks for a BIOS update or try a different PCIe slot. May 16 13:23:28 Excalibur kernel: BUG: kernel NULL pointer dereference, address: 00000000000002f8 May 16 13:23:28 Excalibur kernel: #PF: supervisor write access in kernel mode May 16 13:23:28 Excalibur kernel: #PF: error_code(0x0002) - not-present page May 16 13:23:28 Excalibur kernel: PGD 0 P4D 0 May 16 13:23:28 Excalibur kernel: Oops: 0002 [#1] SMP NOPTI May 16 13:23:28 Excalibur kernel: CPU: 0 PID: 1630 Comm: scsi_eh_11 Not tainted 5.10.21-Unraid #1 May 16 13:23:28 Excalibur kernel: Hardware name: System manufacturer System Product Name/ROG STRIX X470-F GAMING, BIOS 5406 11/13/2019 May 16 13:23:28 Excalibur kernel: RIP: 0010:__ata_qc_complete+0x7c/0xd4 May 16 13:23:28 Excalibur kernel: Code: 65 50 fd 48 c7 85 a0 00 00 00 00 00 00 00 f6 45 28 04 74 16 8b 4d 5c b8 fe ff ff ff d3 c0 41 21 84 24 fc 02 00 00 75 14 eb 0c <41> c7 84 24 f8 02 00 00 fd fc fb fa ff 8b 30 20 00 00 f6 45 50 20 May 16 13:23:28 Excalibur kernel: RSP: 0018:ffffc90000583a00 EFLAGS: 00010046 May 16 13:23:28 Excalibur kernel: RAX: 0000000000000000 RBX: ffff888105558000 RCX: 00000000fffbd60b May 16 13:23:28 Excalibur kernel: RDX: 0000000002c00000 RSI: 0000000000000082 RDI: 0000000000000082 May 16 13:23:28 Excalibur kernel: RBP: ffff888105559f30 R08: 0000000000000000 R09: 000000000000000b May 16 13:23:28 Excalibur kernel: R10: 00000000fedbe000 R11: ffff888000000000 R12: 0000000000000000 May 16 13:23:28 Excalibur kernel: R13: 0000000000000000 R14: ffff888105558000 R15: ffff888105559f30 May 16 13:23:28 Excalibur kernel: FS: 0000000000000000(0000) GS:ffff88840e800000(0000) knlGS:0000000000000000 May 16 13:23:28 Excalibur kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 16 13:23:28 Excalibur kernel: CR2: 00000000000002f8 CR3: 000000000200c000 CR4: 00000000003506f0 May 16 13:23:28 Excalibur kernel: Call Trace: May 16 13:23:28 Excalibur kernel: ata_do_link_abort+0x5e/0x7c May 16 13:23:28 Excalibur kernel: ata_exec_internal_sg+0x339/0x492 May 16 13:23:28 Excalibur kernel: ata_exec_internal+0x6b/0x8b May 16 13:23:28 Excalibur kernel: ata_read_log_page+0xf3/0x144 May 16 13:23:28 Excalibur kernel: ata_log_supported+0x20/0x42 May 16 13:23:28 Excalibur kernel: ata_dev_configure+0x9ff/0x1360 May 16 13:23:28 Excalibur kernel: ata_eh_recover+0x1f7/0xf34 May 16 13:23:28 Excalibur kernel: ? ata_phys_link_offline+0x51/0x51 May 16 13:23:28 Excalibur kernel: ? sata_srst_pmp+0x27/0x27 [libahci] May 16 13:23:28 Excalibur kernel: ? ahci_do_hardreset+0x12d/0x12d [libahci] May 16 13:23:28 Excalibur kernel: ? ahci_stop_engine+0xc3/0xc3 [libahci] May 16 13:23:28 Excalibur kernel: sata_pmp_error_handler+0xf8/0x77c May 16 13:23:28 Excalibur kernel: ? update_cfs_rq_load_avg+0x14b/0x154 May 16 13:23:28 Excalibur kernel: ? __cancel_work_timer+0x10a/0x15d May 16 13:23:28 Excalibur kernel: ? _raw_spin_lock_irqsave+0xf/0x29 May 16 13:23:28 Excalibur kernel: ? lock_timer_base+0x33/0x56 May 16 13:23:28 Excalibur kernel: ahci_error_handler+0x39/0x61 [libahci] May 16 13:23:28 Excalibur kernel: ata_scsi_port_error_handler+0x21b/0x522 May 16 13:23:28 Excalibur kernel: ata_scsi_error+0x8c/0xb5 May 16 13:23:28 Excalibur kernel: scsi_error_handler+0xa5/0x355 May 16 13:23:28 Excalibur kernel: ? _raw_spin_unlock_irqrestore+0xd/0xe May 16 13:23:28 Excalibur kernel: ? scsi_eh_get_sense+0xf8/0xf8 May 16 13:23:28 Excalibur kernel: kthread+0xe5/0xea May 16 13:23:28 Excalibur kernel: ? __kthread_bind_mask+0x57/0x57 May 16 13:23:28 Excalibur kernel: ret_from_fork+0x22/0x30 May 16 13:23:28 Excalibur kernel: Modules linked in: edac_mce_amd kvm_amd kvm wmi_bmof mxm_wmi igb crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd i2c_piix4 i2c_algo_bit glue_helper i2c_core input_leds led_class rapl k10temp wmi ccp ahci libahci button acpi_cpufreq May 16 13:23:28 Excalibur kernel: CR2: 00000000000002f8 May 16 13:23:28 Excalibur kernel: ---[ end trace fbd658f20dd0c4e3 ]--- May 16 13:23:28 Excalibur kernel: RIP: 0010:__ata_qc_complete+0x7c/0xd4 May 16 13:23:28 Excalibur kernel: Code: 65 50 fd 48 c7 85 a0 00 00 00 00 00 00 00 f6 45 28 04 74 16 8b 4d 5c b8 fe ff ff ff d3 c0 41 21 84 24 fc 02 00 00 75 14 eb 0c <41> c7 84 24 f8 02 00 00 fd fc fb fa ff 8b 30 20 00 00 f6 45 50 20 May 16 13:23:28 Excalibur kernel: RSP: 0018:ffffc90000583a00 EFLAGS: 00010046 May 16 13:23:28 Excalibur kernel: RAX: 0000000000000000 RBX: ffff888105558000 RCX: 00000000fffbd60b May 16 13:23:28 Excalibur kernel: RDX: 0000000002c00000 RSI: 0000000000000082 RDI: 0000000000000082 May 16 13:23:28 Excalibur kernel: RBP: ffff888105559f30 R08: 0000000000000000 R09: 000000000000000b May 16 13:23:28 Excalibur kernel: R10: 00000000fedbe000 R11: ffff888000000000 R12: 0000000000000000 May 16 13:23:28 Excalibur kernel: R13: 0000000000000000 R14: ffff888105558000 R15: ffff888105559f30 May 16 13:23:28 Excalibur kernel: FS: 0000000000000000(0000) GS:ffff88840e800000(0000) knlGS:0000000000000000 May 16 13:23:28 Excalibur kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 16 13:23:28 Excalibur kernel: CR2: 00000000000002f8 CR3: 000000000200c000 CR4: 00000000003506f0
  23. I said it's not normal, but if it's always the same device it suggests a device problem, like is corrupting the MBR/partition when it's not cleanly powered down/unmounted.
×
×
  • Create New...