Jump to content

JorgeB

Moderators
  • Posts

    67,092
  • Joined

  • Last visited

  • Days Won

    703

Everything posted by JorgeB

  1. Would need the diags to confirm but they are likely being detect with a different size, one of the possible issues when using a RAID controller.
  2. This happened recently to a user and I just confirmed it's a bug, starting on v6.7.0 when for example removing a device from a 2 device raid1 pool, the command changed from: balance start -f -dconvert=single -mconvert=single /mnt/cache && /sbin/btrfs device delete /dev/sde1 /mnt/cache & to balance start -f -dconvert=single /mnt/cache && /sbin/btrfs device delete /dev/sde1 /mnt/cache & The metadata isn't being convert to single profile, so the device delete will fail, and Unraid will keep trying every array start.
  3. Those call traces during parity check (and related bad performance) can usually be solved by lowering the md_sync_thresh value, though the current value is pretty low already.
  4. The error is unrelated to the email, and no idea waht's causing that, I only meant you should always replace a failing disk.
  5. It will work with SSDs that support read zeros after trim.
  6. Any pool requires btrfs, you can have multiple pools with some limitations and the help of UD, until Unraid supports them.
  7. Please post the diagnostics: Tools -> Diagnostics
  8. That's definitely not a Dell H310, looks like a SAS2LP, possibly a RocketRAID, there's at least one model that looks identical on the device list to the SAS2LP, but the missing drives issues appears to be limited to the Marvell 9230
  9. Dell H310 uses an LSI chipset and will work fine.
  10. Both look fine, and graphs look strange for a disk problem, grab diags during the next parity check and post then, better on a new thread.
  11. Compression can be enable, AFAIK btrfs deduplication is disable in the kernel, it's still considered experimental, also from what I understand it currently doesn't work together with compression, you can use one or de other.
  12. Keep only parity assigned after the new config and check "parity is already valid" before starting the array, after starting the array at least once, add parity2 and sync it.
  13. You can re-order disks and parity will still be valid, only parity2 will need re-syncing.
  14. Did a quick test, mostly out of curiosity, and if a small device is used first it's as I suspected, parity sync finishes successfully and is reported as valid but it's only synced up to the size of the original device, so after that it will be out of sync (unless the disk was cleared). I also remember some cases that could have resulted from this bug, also cases where multiple users have reported similar issue (parity completely out of sync after a certain point) after doing a parity-swap, but can't see how it relates directly to this, so likely a different corner case/bug.
  15. This bug likely exists for some time, guess it's a corner case, but an user ran into it today. How to reproduce: Say you have all 2TB data disks, upgrade parity to a larger disk, e.g. 3TB, start the array and cancel the parity sync, stop the array and replace the 3TB parity with a 2TB disk, start array and parity sync will start again but will still show the old 3TB size for total parity size (not the disk itself), then it will error out during the sync when it runs past the actual parity size with an error similar to this one: May 28 19:04:44 Tower9 kernel: attempt to access beyond end of device May 28 19:04:44 Tower9 kernel: sdc: rw=1, want=976773176, limit=976773168 May 28 19:04:44 Tower9 kernel: md: disk0 write error, sector=976773104 May 28 19:04:44 Tower9 kernel: attempt to access beyond end of device May 28 19:04:44 Tower9 kernel: sdc: rw=1, want=976773184, limit=976773168 May 28 19:04:45 Tower9 kernel: md: disk0 write error, sector=976773112 May 28 19:04:45 Tower9 kernel: md: recovery thread: exit status: -4 This will result in parity disk being disabled, and the user will need to sync it again. I guess there will also be a problem if a small disk is used first and then replaced with a larger one, likely parity will say valid but it won't be synced past the end of the smaller device.
  16. It does for 16 ports if you don't want to risk a bottleneck, not because it's 6gbps but because it's PCIe 2.0, though for WD Reds up to 6TB it shouldn't be much of one, but it can be for faster disks.
  17. Cache filesystem is fully allocated, this will result in ENOSPC errors, see here for how to fix. After that's done delete and re-create the docker image. Edit: this might not be the only problem but it's definitely one.
  18. LSI 9300-16i, though an 8 port HBA + expander might be cheaper on ebay.
  19. IIRC all the people affected are using the same Realtek NIC, so likely the change was NIC driver in the new kernel.
  20. I don't mind posting it, but note that I known nothing about scripting, I'm just good a googling and finding examples of what I want to do, so the script is very crude and while it works great for me and my use case it likely won't for other use cases, also: -send/receive has currently no way of showing progress/transfer size, so I do it by using pv after comparing the used sized on both servers, obviously this will only be accurate if both servers contain the same data, including the same snapshots, i.e., when I delete old snapshots on source I also delete them on destination. -you'll need to pre-create the ssh keys. -if any of the CPUs doesn't have hardware AES support remove "-c [email protected]" from the SSH options. -for the script to work correctly the most recent snapshot (the one used as parent for the incremental btrfs send) must exist on source and destination, so the initial snapshot for all disks needs to be sent manually, using the same name format. #!/bin/bash #Snapshot date format nd=$(date +%Y-%m-%d-%H%M) #Dest IP Address ip="192.168.1.24" #Share to snapshot and send/receive sh=TV #disks that have share to snapshot and send/receive for i in {1..28} ; do #calculate and display send size s=$(BLOCKSIZE=1M df | grep -w disk$i | awk '/[0-9]%/{print $(3)}') su=$(BLOCKSIZE=1M df | grep -w user | awk '/[0-9]%/{print $(3)}') d=$(ssh root@$ip BLOCKSIZE=1M df | grep -w disk$i | awk '/[0-9]%/{print $(3)}') du=$(ssh root@$ip BLOCKSIZE=1M df | grep -w user | awk '/[0-9]%/{print $(3)}') t=$((s-d)) if [ "$t" -lt 0 ] ; then ((t = 0)) ; fi g=$((t/1024)) tu=$((su-du)) if [ "$tu" -lt 0 ] ; then ((tu = 0)) ; fi gu=$((tu/1024)) echo -e "\e[32mTotal transfer size for disk$i is ~"$g"GiB, total remaining for this backup is ~"$gu"GiB\e[0m" #source snaphots folder cd /mnt/disk$i/snaps #get most recent snapshot sd=$(echo $sh_* | awk '{print $NF}') #make a new snapshot and send differences from previous one btrfs sub snap -r /mnt/disk$i/$sh /mnt/disk$i/snaps/"$sh"_$nd sync btrfs send -p /mnt/disk$i/snaps/$sd /mnt/disk$i/snaps/"$sh"_$nd | pv -prtabe -s "$t"M | ssh -c [email protected] root@$ip "btrfs receive /mnt/disk$i" if [[ $? -eq 0 ]]; then ssh root@$ip sync echo -e "\e[32mdisk$i send/receive complete\e[0m" printf "\n" else echo -e "\e[31mdisk$i send/receive failed\e[0m" /usr/local/emhttp/webGui/scripts/notify -i warning -s "disk$i send/receive failed" fi done /usr/local/emhttp/webGui/scripts/notify -i normal -s "T5>T6 Sync complete"
  21. It's an option, most of my media servers only have one share, so I just snapshot that and send/receive to the backup server, I have a script that does an incremental send/receive to all disks in order.
  22. With xfs_repair -n, when there's nothing obvious in the output, the only way to know if errors were detected is to check the exit status, 0 means no errors detected, 1 means errors detected
  23. If there's no lost+found folder then nothing was moved there, message always appears regardless. Drive failure and filesystem corruption are two very different things, parity can't help with the latter, same as protecting against accidental deletions or ransomware, that's what backups are for.
  24. Rebuilding a disk won't help with filesystem corruption, you need to check filesystem on disk17: https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui or https://wiki.unraid.net/Check_Disk_Filesystems#Drives_formatted_with_XFS Seems to be relatively common some xfs disks going unmountable after a kernel upgrade, it happened before and it's happening to some users now, likely newer kernel is detecting some previous undetected corruption.
×
×
  • Create New...