Jump to content

JorgeB

Moderators
  • Posts

    67,112
  • Joined

  • Last visited

  • Days Won

    703

Everything posted by JorgeB

  1. Keep only parity assigned after the new config and check "parity is already valid" before starting the array, after starting the array at least once, add parity2 and sync it.
  2. You can re-order disks and parity will still be valid, only parity2 will need re-syncing.
  3. Did a quick test, mostly out of curiosity, and if a small device is used first it's as I suspected, parity sync finishes successfully and is reported as valid but it's only synced up to the size of the original device, so after that it will be out of sync (unless the disk was cleared). I also remember some cases that could have resulted from this bug, also cases where multiple users have reported similar issue (parity completely out of sync after a certain point) after doing a parity-swap, but can't see how it relates directly to this, so likely a different corner case/bug.
  4. This bug likely exists for some time, guess it's a corner case, but an user ran into it today. How to reproduce: Say you have all 2TB data disks, upgrade parity to a larger disk, e.g. 3TB, start the array and cancel the parity sync, stop the array and replace the 3TB parity with a 2TB disk, start array and parity sync will start again but will still show the old 3TB size for total parity size (not the disk itself), then it will error out during the sync when it runs past the actual parity size with an error similar to this one: May 28 19:04:44 Tower9 kernel: attempt to access beyond end of device May 28 19:04:44 Tower9 kernel: sdc: rw=1, want=976773176, limit=976773168 May 28 19:04:44 Tower9 kernel: md: disk0 write error, sector=976773104 May 28 19:04:44 Tower9 kernel: attempt to access beyond end of device May 28 19:04:44 Tower9 kernel: sdc: rw=1, want=976773184, limit=976773168 May 28 19:04:45 Tower9 kernel: md: disk0 write error, sector=976773112 May 28 19:04:45 Tower9 kernel: md: recovery thread: exit status: -4 This will result in parity disk being disabled, and the user will need to sync it again. I guess there will also be a problem if a small disk is used first and then replaced with a larger one, likely parity will say valid but it won't be synced past the end of the smaller device.
  5. It does for 16 ports if you don't want to risk a bottleneck, not because it's 6gbps but because it's PCIe 2.0, though for WD Reds up to 6TB it shouldn't be much of one, but it can be for faster disks.
  6. Cache filesystem is fully allocated, this will result in ENOSPC errors, see here for how to fix. After that's done delete and re-create the docker image. Edit: this might not be the only problem but it's definitely one.
  7. LSI 9300-16i, though an 8 port HBA + expander might be cheaper on ebay.
  8. IIRC all the people affected are using the same Realtek NIC, so likely the change was NIC driver in the new kernel.
  9. I don't mind posting it, but note that I known nothing about scripting, I'm just good a googling and finding examples of what I want to do, so the script is very crude and while it works great for me and my use case it likely won't for other use cases, also: -send/receive has currently no way of showing progress/transfer size, so I do it by using pv after comparing the used sized on both servers, obviously this will only be accurate if both servers contain the same data, including the same snapshots, i.e., when I delete old snapshots on source I also delete them on destination. -you'll need to pre-create the ssh keys. -if any of the CPUs doesn't have hardware AES support remove "-c [email protected]" from the SSH options. -for the script to work correctly the most recent snapshot (the one used as parent for the incremental btrfs send) must exist on source and destination, so the initial snapshot for all disks needs to be sent manually, using the same name format. #!/bin/bash #Snapshot date format nd=$(date +%Y-%m-%d-%H%M) #Dest IP Address ip="192.168.1.24" #Share to snapshot and send/receive sh=TV #disks that have share to snapshot and send/receive for i in {1..28} ; do #calculate and display send size s=$(BLOCKSIZE=1M df | grep -w disk$i | awk '/[0-9]%/{print $(3)}') su=$(BLOCKSIZE=1M df | grep -w user | awk '/[0-9]%/{print $(3)}') d=$(ssh root@$ip BLOCKSIZE=1M df | grep -w disk$i | awk '/[0-9]%/{print $(3)}') du=$(ssh root@$ip BLOCKSIZE=1M df | grep -w user | awk '/[0-9]%/{print $(3)}') t=$((s-d)) if [ "$t" -lt 0 ] ; then ((t = 0)) ; fi g=$((t/1024)) tu=$((su-du)) if [ "$tu" -lt 0 ] ; then ((tu = 0)) ; fi gu=$((tu/1024)) echo -e "\e[32mTotal transfer size for disk$i is ~"$g"GiB, total remaining for this backup is ~"$gu"GiB\e[0m" #source snaphots folder cd /mnt/disk$i/snaps #get most recent snapshot sd=$(echo $sh_* | awk '{print $NF}') #make a new snapshot and send differences from previous one btrfs sub snap -r /mnt/disk$i/$sh /mnt/disk$i/snaps/"$sh"_$nd sync btrfs send -p /mnt/disk$i/snaps/$sd /mnt/disk$i/snaps/"$sh"_$nd | pv -prtabe -s "$t"M | ssh -c [email protected] root@$ip "btrfs receive /mnt/disk$i" if [[ $? -eq 0 ]]; then ssh root@$ip sync echo -e "\e[32mdisk$i send/receive complete\e[0m" printf "\n" else echo -e "\e[31mdisk$i send/receive failed\e[0m" /usr/local/emhttp/webGui/scripts/notify -i warning -s "disk$i send/receive failed" fi done /usr/local/emhttp/webGui/scripts/notify -i normal -s "T5>T6 Sync complete"
  10. It's an option, most of my media servers only have one share, so I just snapshot that and send/receive to the backup server, I have a script that does an incremental send/receive to all disks in order.
  11. With xfs_repair -n, when there's nothing obvious in the output, the only way to know if errors were detected is to check the exit status, 0 means no errors detected, 1 means errors detected
  12. If there's no lost+found folder then nothing was moved there, message always appears regardless. Drive failure and filesystem corruption are two very different things, parity can't help with the latter, same as protecting against accidental deletions or ransomware, that's what backups are for.
  13. Rebuilding a disk won't help with filesystem corruption, you need to check filesystem on disk17: https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui or https://wiki.unraid.net/Check_Disk_Filesystems#Drives_formatted_with_XFS Seems to be relatively common some xfs disks going unmountable after a kernel upgrade, it happened before and it's happening to some users now, likely newer kernel is detecting some previous undetected corruption.
  14. That's normal, just means dd reached the end of the device.
  15. Yes, for the cache pool, and trim always worked for the cache pool, it doesn't for array devices.
  16. Those disks are on USB and don't even have accurate SMART reports, maybe make FCP ignore any disk connect by USB if that's an option.
  17. This seems to be working for most.
  18. See just 4 or 5 post above for a possible solution.
  19. There is a known issue with the 9230 Marvell controller and the newer kernels, you can usually get around it with this, but better yet, use the Intel onboard ports instead, since they are all unused, the Marvell controller (first 4 white SATA ports) on those boards is known to drop disks. May 16 17:14:57 Basestar kernel: ata3: SATA link down (SStatus 0 SControl 300) May 16 17:14:57 Basestar kernel: ata2: SATA link down (SStatus 0 SControl 300) May 16 17:14:57 Basestar kernel: ata5: SATA link down (SStatus 0 SControl 300) May 16 17:14:57 Basestar kernel: ata4: SATA link down (SStatus 0 SControl 300) May 16 17:14:57 Basestar kernel: ata1: SATA link down (SStatus 0 SControl 300) May 16 17:14:57 Basestar kernel: ata6: SATA link down (SStatus 0 SControl 300)
  20. If you don't need IOMMU you can always disable it in the BIOS, should get rid of those errors.
  21. Not sure, you can check in the plugin thread. Disable it in the bios. No, that's the trimmed capacity, i.e., the free space. At least once a week is good practice
  22. No, and they usually perform well for most uses.
  23. Looks like a kernel bug with some Ryzen and NVMe with IOMMU enable: https://bugzilla.kernel.org/show_bug.cgi?id=202665
  24. And @olschoolplease post back here if they have a solution.
×
×
  • Create New...