Jump to content

JorgeB

Moderators
  • Posts

    67,826
  • Joined

  • Last visited

  • Days Won

    708

Everything posted by JorgeB

  1. When the checksums are done by the filesystem, like zfs or btrfs, they are done block by block, not by file, when done by for example file integrity plugin they are done file by file, and they are always the same size, and very small, they fit in the extended attributes. There are various way, I for example use snapshots, they are read-only and cannot be modified by those kind of attacks.
  2. If this was interrupted you need to run it again, since you replaced the disk you can run it on the old unassigned disk to avoid having the array offline, assuming you have enough ports, and then copy the data over, in that case take the opportunity to format the new disk xfs.
  3. Bitrot is extremely rare, and the drives have error correction, but it can happen, IMHO cheksums are mostly useful for when for example an issue occurs during a disk rebuild, like errors on another disk, and you can see if any/which files were affected.
  4. Enable the syslog server and post that after a crash.
  5. Upsides: Mainly checksum and snapshot support. Downsides: Not as resilient as xfs especially with bad hardware, and recovery in case of serious corruption might be more difficult, though in my experience, and I have around 200 btrfs filesystems, all except about a dozen are single device, singe device filesystems, like the ones used in the array are more resilient than multi device filesystems, against corruption, not against a disk failure obviously, for that you use parity in the array.
  6. No filesystem corruption detected so far, first thing to do is to check if the files are there or not, browse the shares using for example midnight commander (mc on the console), if they are there you should then check the permissions.
  7. Please post the diagnostics, though anything that happen before this last boot cannot be seen.
  8. Aug 3 02:15:23 black shfs: shfs: ../lib/fuse.c:1451: unlink_node: Assertion `node->nlookup > 1' failed. It's the issue below: Some workarounds discussed there, mostly disable NFS if not needed or you can change everything to SMB, can also be caused by Tdarr if you use that.
  9. If the drives were disabled when the ports failed they need to be rebuilt, just using a different controller would not necessitate rebuild.
  10. Yes, both checks were correct and found the same errors, this might suggest a problem with one of the disks, since there were several ATA errors with disk3 replace cables there and run another check.
  11. Auto pariyt check after an unclean shutdown is non correct, last one you ran was correct, so there should be no more errors on the next check.
  12. Yes, kooks OK now, note that the pool is in single profile, i.e., no redundancy but you can use the total capacity from both devices, if you want to convert to raid1 (redundant but only 120GB can be used since they have different capacities) you first need to remove some data from cache1.
  13. Filesystem is going read-only due to lack of space, but I'm not seeing why it's running out of space at mount time, this usually happens when a balance was started but not signs of "balance resumed" in the log, in any case please try this, with the array stopped type: mkdir /temp mount -o skip_balance /dev/sdf1 /temp btrfs balance cancel /temp umount /temp Then start the array and post new diags.
  14. Run chkdsk on it but the flash should be OK.
  15. That's expected, once a disk gets disabled it needs to be rebuilt: https://wiki.unraid.net/Manual/Storage_Management#Rebuilding_a_drive_onto_itself
  16. Please reboot and post new diags after array start.
  17. Aug 1 12:35:11 Tower kernel: general protection fault, probably for non-canonical address 0x9c0101000034: 0000 [#1] SMP PTI Aug 1 13:01:44 Tower kernel: irq 16: nobody cared (try booting with the "irqpoll" option) A couple of days ago there was the same error as last time and then IRQ 16 also got disabled, possibly not a big issue since the server worked for 2 more days, main issue was that this USB controller stopped working: Aug 3 12:35:33 Tower kernel: xhci_hcd 0000:00:14.0: xHCI host controller not responding, assume dead And the flash drive was using it, so after that Unraid cannot continue to work correctly.
  18. Start here: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=819173
  19. SMART cannot always identify issues, I would suggest swapping cables again with a different disk to make sure, and if it happens again replace it.
  20. There wasn't a valid btrfs filesystem in sdu, suggesting the device was wiped at some point, because of that Unraid if first deleting the missing device, I do see a lot of these errors logged: Aug 3 13:13:34 Anderson kernel: BTRFS warning (device sds1): direct IO failed ino 119162 rw 0,0 sector 0xfe1a280 len 4096 err no 10 Aug 3 13:13:34 Anderson kernel: BTRFS warning (device sds1): direct IO failed ino 119162 rw 0,0 sector 0xfe1a288 len 4096 err no 10 Aug 3 13:13:34 Anderson kernel: BTRFS warning (device sds1): direct IO failed ino 119162 rw 0,0 sector 0xfe1a290 len 4096 err no 10 Not sure what these mean exactly but while the balance is going lets see if it finishes.
  21. Forgot to mention, likely sdak was old disk9 and it was replaced, so it's the same filesystem, but older generation.
  22. Aug 3 12:22:46 Anderson root: WARNING: adding device /dev/sdak1 gen 3145 but found an existing device /dev/sdj1 gen 3159 Sdj is disk9, sdak is currently unassigned, wipe or disconnect sdak since it appears to be conflicting with the pool, then please reboot and post new diags after array start.
×
×
  • Create New...