Jump to content

JorgeB

Moderators
  • Posts

    67,492
  • Joined

  • Last visited

  • Days Won

    706

Everything posted by JorgeB

  1. If the drives are not detected by the LSI BIOS the same will be the case for Unraid, try a different cable.
  2. It was reported on the log as an actual media error, and SMART suggests the disk has seen better days, still and since the SMART test passed it's OK for now, but keep an eye on these attributes, if they keep increasing it will likely fail again soon: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 501 200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 41 P.S. you also have an issue with on one of your cache devices (cache1) Jun 20 23:10:51 Tower kernel: sd 7:0:2:0: [sdj] tag#5577 CDB: opcode=0x28 28 00 07 8b 60 68 00 00 28 00 Jun 20 23:10:51 Tower kernel: scsi target7:0:2: handle(0x000b), sas_address(0x4433221102000000), phy(2) Jun 20 23:10:51 Tower kernel: scsi target7:0:2: enclosure logical id(0x500605b005d921f0), slot(2) Jun 20 23:10:51 Tower kernel: sd 7:0:2:0: task abort: SUCCESS scmd(00000000f0051dde) Jun 20 23:10:51 Tower kernel: sd 7:0:2:0: attempting task abort! scmd(00000000c3de18bf) Jun 20 23:10:51 Tower kernel: sd 7:0:2:0: [sdj] tag#5576 CDB: opcode=0x28 28 00 07 8b 60 40 00 00 20 00 Jun 20 23:10:51 Tower kernel: scsi target7:0:2: handle(0x000b), sas_address(0x4433221102000000), phy(2) Jun 20 23:10:51 Tower kernel: scsi target7:0:2: enclosure logical id(0x500605b005d921f0), slot(2) Jun 20 23:10:51 Tower kernel: sd 7:0:2:0: task abort: SUCCESS scmd(00000000c3de18bf) Jun 20 23:10:51 Tower kernel: sd 7:0:2:0: Power-on or device reset occurred Jun 20 23:10:51 Tower kernel: repair_io_failure: 5 callbacks suppressed Jun 20 23:10:51 Tower kernel: BTRFS info (device sdj1): read error corrected: ino 2781520 off 1757184 (dev /dev/sdj1 sector 275203400) Jun 20 23:10:51 Tower kernel: BTRFS info (device sdj1): read error corrected: ino 253968 off 11206307840 (dev /dev/sdj1 sector 70707144) Jun 20 23:10:51 Tower kernel: BTRFS info (device sdj1): read error corrected: ino 253968 off 11206438912 (dev /dev/sdj1 sector 70919640) Jun 20 23:10:51 Tower kernel: BTRFS info (device sdj1): read error corrected: ino 253968 off 11206324224 (dev /dev/sdj1 sector 70768960) Jun 20 23:10:51 Tower kernel: BTRFS info (device sdj1): read error corrected: ino 253968 off 11206303744 (dev /dev/sdj1 sector 70605544) Jun 20 23:10:51 Tower kernel: BTRFS info (device sdj1): read error corrected: ino 253968 off 11686825984 (dev /dev/sdj1 sector 126595704) Jun 20 23:10:51 Tower kernel: BTRFS info (device sdj1): read error corrected: ino 253968 off 11687796736 (dev /dev/sdj1 sector 126575096) Jun 20 23:10:51 Tower kernel: BTRFS info (device sdj1): read error corrected: ino 253968 off 11206344704 (dev /dev/sdj1 sector 70877376) Jun 20 23:10:51 Tower kernel: BTRFS info (device sdj1): read error corrected: ino 253968 off 11206340608 (dev /dev/sdj1 sector 70872680) Jun 20 23:10:51 Tower kernel: BTRFS info (device sdj1): read error corrected: ino 253968 off 11206299648 (dev /dev/sdj1 sector 70445032) Replace cables or connect it to a different controller.
  3. There's a known issue with those NICs, it's not detected on most PCIe 2.0 or newer slots, e.g.: https://www.dell.com/community/PowerEdge-Hardware-General/Intel-PRO-1000-PT-PCI-E-Quad-EXPI9404PTBLK-T110/td-p/3881138
  4. Pool should show as new before array start, not after, now it's showing as unmountable, also those very high write numbers suggest a problem, please post the diagnostics: Tools -> Diagnostics
  5. SMART looks fine, complete diags might show something more.
  6. This is normal after a new config, as long as all the original pool members are assigned the original pool will be imported (there can't be a "all data on this device will be deleted after array start" or similar warning for any of the devices.
  7. That won't matter, it will still show any errors and the filenames on the syslog, just when you try to access the file it shows the checksum error but not the filename.
  8. That's normal if you're using the default system share, since it doesn't checksum the data, so also no way to verify or fix it, that's also mentioned in the FAQ link.
  9. That usually happens when a disk drops offline or there was another controller related issue, you should post the diags.
  10. This will happen if that disk had the fs set to "xfs" instead of "auto", only disks set to "auto" will use the default filesystem on the disk settings, you can check by going to main and clicking on the disk .
  11. You need to completely wipe the device, e.g., with blkdiscard, or removr the partition, if you just change the filesystem Unraid will use the existing partition when re-formatting.
  12. Frequent ATA errors on the log, which result in freezes and can even drop devices in some cases, Asmedia is good as a 2 port controller, not good with port multipliers, for more ports get a JMB585 or an 8 port LSI HBA.
  13. It's considered safe enough for most if running raid1 metadata, Unraid default, should not use raid5/6 for metadata, I'm running several raid5 pools for more than a year without issues, still should have backups of anything important.
  14. Try this: https://forums.unraid.net/topic/77912-solved-cant-start-vm-after-restore/?do=findComment&comment=721029
  15. Please don't double post, locking this one, anyone responding please use the other thread:
  16. Did a test with a Windows VM to see if there was a difference with the new partition alignment, total bytes written after 16 minutes (VM is idling doing nothing, not even internet connected): space_cache=v1, old alignment - 7.39GB space_cache=v2, old alignment - 1.72GB space_cache=v2, new alignment - 0.65GB So that's encouraging, though I guess that unlike v2 space cache the new alignment might work better for some NVMe devices and don't make much difference for others, still worth testing IMHO, since for some it should also give better performance, for this test I used an Intel 600p.
  17. Someone/something did, it's certainly not default, but yes you can remove it, on the main page click on flash, scroll down to Syslinux configuration, make sure it's set to "menu view" (upper right corner) and remove: xen-pciback.hide=(07:00.0)(08:00.0)(09:00.0)(0a:00.0)
  18. As long as there are no uncorrectable errors all should be fine.
  19. It would still be my first guess, could also be a disk, but that's not very likely and hopefully it's RAM since it would be easier to find/fix.
  20. Barbecue of hiding the controller from Unraid with that custom command line, it will hide any controllers/cards using those IDs, HBA is currently 07:00:0
  21. Disk is dropping offline, did you also replace the power cable? If not do that, if yes, swap that disk's SATA and power cable with another one, preferably one connected to the onboard SATA controller and post new diags if/when it fails again.
  22. Problems with the NVMe device: Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 769 QID 1 timeout, aborting Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 698 QID 3 timeout, aborting Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 128 QID 4 timeout, aborting Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 0 QID 5 timeout, aborting Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 1 QID 5 timeout, aborting Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 2 QID 5 timeout, aborting Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 3 QID 5 timeout, aborting Jul 14 08:32:43 Tower kernel: nvme nvme0: I/O 5 QID 5 timeout, aborting Jul 14 08:33:13 Tower kernel: nvme nvme0: I/O 769 QID 1 timeout, reset controller Jul 14 08:33:44 Tower kernel: nvme nvme0: I/O 0 QID 0 timeout, reset controller This can sometimes help: Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on flash, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (on the top right) and add this to your default boot option, after "append" nvme_core.default_ps_max_latency_us=0 Reboot and see if it makes a difference.
  23. Scrub should always output the affected filenames to the log, it won't be on the log when a checksum error is detected during a normal file read, just the error without the filename.
  24. Again, if that was the case everyone using those NICs would have the same problem, no?
  25. I already mentioned that: The link I posted also says what you should do.
×
×
  • Create New...