fritzdis

Members
  • Posts

    58
  • Joined

  • Last visited

Everything posted by fritzdis

  1. Yeah, I'll give the check a go, and then I guess I'll try the rebuild. If that succeeds, it's possible there was something about the unassigned drive (Toshiba MG07ACA14TE) that was causing an issue in the external enclosure, so I'll leave that out of the system for a while.
  2. Thanks. Seems to have mounted fine. Syslog from after attached. I guess it won't hurt to run a non-correcting parity check. But since that didn't trigger the issue last time, I'm still worried about how I will assess the hardware situation. sf-unraid-syslog-20240118-1607.zip
  3. Was not able to shutdown cleanly. Removed unattached device and booted up. Here are the new diagnostics. All drives are present, with disk 7 disabled as you said. Also, since shutdown was unclean, starting the array would trigger a parity check. However, since I suspect a hardware issue somewhere in the chain, I am hesitant to do much of anything without diagnosing the issue if possible. Unfortunately, I am not able to connect all the drives without the controller card. Also, that card is connected to a KTN-STL3 external enclosure, which means there are multiple potential issues (card, cable, enclosure), so I'm really not sure what my best next step is. sf-unraid-diagnostics-20240118-0829.zip Edit: I would say the card itself may be the most likely issue because I replaced the heatsink. However, as I said, I did run a full parity check after that without issue, so it's less of a sure thing. But if replacing the card entirely seems like the best move, I'm open to that.
  4. Yeah, I figured that was probably the case. What's weird is I ran a parity check a couple days ago without issue, and I can't think of what activity would have even been going on last night to trigger things, other than the preclear on the unattached device. In any case, I'll try to shutdown cleanly (that isn't going well so far), remove the unattached drive, and boot back up for diagnostics.
  5. mount /dev/sdh1 /boot (after determining USB drive must be sdh)
  6. I was able to remount the USB drive. Here are the additional log files. The repeated nginx errors are from leaving the webGUI open overnight by accident. Will reboot for diagnostics in a little while if there are no other suggestions. syslog1.txt syslog2.txt
  7. They exist. However, I removed the boot drive to copy the first one over to Windows, and I guess it does not remount when reinserted. So I'm not sure how to actually get those additional logs.
  8. Overnight, my server (on 6.12.6) apparently encountered significant errors, to the point where multiple drives became unavailable. I suspect this may be related to the HBA card to which the drives were connected. Also, I was running a preclear on a newly acquired (refurbished) drive connected to that HBA. Unfortunately, I am unable to collect diagnostics, even via the console directly on the server (it hangs indefinitely). Via the webGUI, it appears to get stuck on this command: sed -ri 's/^(share(Comment|ReadList|WriteList)=")[^"]+/\1.../' '/sf-unraid-diagnostics-20240118-0618/shares/appdata.cfg' 2>/dev/null This also makes the entire webGUI unresponsive. From the console, I attempted to capture the syslog with this command: cp /var/log/syslog /boot/syslog.txt While this did save something (see attached), it is quite incomplete. It believe it does not show the instigating event(s). Any suggestions on what to do next? syslog-manual.txt
  9. Just in case you haven't already made your decision - I think it depends on use case. The 2nd linked thread above was mine. I still went ahead and converted all my drives, transferring 10s of TBs of data. It was certainly slow, but it should be a one-time process. I use SSD cache for most of my writes, with mover running daily. I'm sure it's not running as fast as XFS would, but I don't care because it's invisible to me.
  10. Yeah, the issue is the extra drop-off once write speed is below read speed (either from the source or limited by bwlimit). The only way to be SURE to avoid that is to go ridiculously low. There may be a sweet spot value for every individual drive where you spend a small enough amount of time in the "degraded speed" zone that the performance loss there is offset by faster overall speeds. Whether that's the case would depend on just how degraded the speed is.
  11. I believe this is because of what JorgeB said - ZFS writes don't seem the fill the drive in the way you would expect. This doesn't necessarily mean individual files are significantly fragmented, but it definitely makes the bwlimit approach tricky. I'll be if you dropped all the way down to like 75000, you'd get close to that speed, since it's likely slower than the HDD's worst-case sequential speed. But of course, that could be leaving a lot of performance on the table if there are still big stripes of empty space on the outer tracks (I don't know if there's a way to "view" that for ZFS).
  12. Small follow up for reference: Even using the bwlimit option for rsync can be tricky. It's unclear, but it appears to me that even for a fresh disk, the writes may not always just work their way inward from the outer, faster tracks. So it can be hard to predict what speed the writes will be able to handle before the choppiness appears. The only surefire approach seems to be using a limit below the minimum sustained write speed of the disk, which sacrifices potential performance. If you periodically monitor it, you can somewhat dial it in better, but in that case, speeds could still drop off a cliff at any moment it seems.
  13. Yeah, I don't imagine limiting mover speeds was ever a very common request.
  14. Good to know, thanks! For the big initial data transfer, it seems I can just use the bwlimit option for rsync to get reasonably good performance. After that, I'll have parity and cache. As I said, I won't be particularly concerned about array write speeds at that point. Is there any way to set a speed limit for the mover, since that may actually improve its performance?
  15. I'm seeing a write speed issue on a ZFS array disk (no parity) when the source disk is able to provide data faster than the destination disk can write it. In a controlled test, write speed was about 80 MB/s, whereas by limiting the source speed, write speed was over 170 MB/s. I would expect this to be reproducible on at least some systems. My system has dual Xeon E5-2450s (not V2). Their age and fairly weak single-core performance may have something to do with the issue, but given how drastic the difference is, I don't think that CPU limitations fully explains it. -- Here are the details: I added a new drive (12TB Red Plus) to my unprotected array. It had no previous partition, and I let unRAID format it as ZFS (no compression). I created a share called "Test" on only that disk with no secondary storage. On my existing SSD cache pool, I have a "Download" share (shows as exclusive access). In that share, I copied several very large media files into a "_test" folder (83 GB total). At the command line, I navigated to the Download test folder and ran the following: rsync -h --progress --stats -r -tgo -p -l * /mnt/user/Test/ This took 17m57s, for speed of 79 MB/s (calculated, not just taking rsync's word for it). As you can see from the first image, write activity was very sporadic. During the dips, one or two CPU threads were pegged at 100% (usually two). Next, I removed the files from the Test share and reran the command, this time adding the --bwlimit option: rsync -h --progress --stats -r -tgo -p -l --bwlimit=180000 * /mnt/user/Test/ I may have been able to go higher, but I wanted to make sure to use a speed under the max capabilities of the destination disk. It completed in just 8m12s, for a speed of 173 MB/s. The second image shows a drastically different profile for the write activity. Finally, just in case the order of operations had an impact, I removed the files again and reran the original command without the limit. The speed & activity profile was the same as before. -- I'm thinking of reformatting the test disk to BTRFS to see how it behaves (I still want to use ZFS long-term). I also don't know what will happen once I have dual-parity. Although using reconstruct write, I would expect to still be limited by this issue. I'm actually not that concerned about array write speed once I have the data populated, but the performance loss is so substantial that I thought I should report it in case anything can be done. Let me know if there's any additional testing I could do that would help. sf-unraid-diagnostics-20230526-2129.zip
  16. Is it expected that writing to a ZFS array drive might be CPU-constrained? I'm seeing drops to zero while a couple cores on my old Xeon CPUs (weak single-threaded) are pegged at 100% for a bit. I don't have a parity drive at the moment (did a new config to start using ZFS drives), so I thought I'd have full write performance while filling the drives. The attached image shows write performance using Krusader, rsync (via LuckyBackup), and a simple copy command. If this is not the expected behavior, I'll make a separate thread with more detail.
  17. That didn't work either (says not running). But I was able to stop the array after the other scrub finished. I unmounted all the UD devices except the failed disk (which wouldn't unmount). The scrub errors kept appearing, so something was still going in the background, but since the only remaining mounted disk was the failed one, I went ahead and shut down the system to remove the disk. Booting back up, there's nothing concerning in the syslog, so nothing more to do on my system. And this is a rare enough situation that I doubt it needs to be addressed (since the array could be stopped, which was my main concern). diagnostics-20230523-0910.zip
  18. It's a disk that was mounted through Unassigned Devices. Not part of cache or any pool.
  19. During a BTRFS scrub on an Unassigned Device, the disk started having issues and eventually dropped offline, it looks like. I guess it came back as a different device (sdv instead of sdn), but now the log is slowly filling with scrub errors relating to the original device. I've tried "btrfs scrub cancel" commands for /dev/sdn, /dev/sdn1, /dev/sdv, and /dev/sdv1. In each case, it says it's not a mounted btrfs device. Is there anything else I can try to cancel this scrub? I have another active scrub on a different disk that should finish soon. But I suspect that the scrub on the failed disk will continue and prevent cleanly stopping the array/shutting down. Presumably it would eventually finish, but I don't have a sense of the time that would take or whether errors would completely fill the log before doing so. diagnostics-20230522-1902.zip
  20. Thanks! The vast majority of my mappings are /something -> /mnt/user/something or similar (with or without trailing slash). I have a single /mnt/user mapping that I can change to separate mappings to the specific shares I need. So maybe I'll take the plunge on RC6 so I can get started on converting my data disks to ZFS one by one. As far as I can tell, this was the only known issue holding me back.
  21. I'm itching to upgrade, but I am concerned with the possibility of the "exclusive" shares feature breaking my existing structure. I know it's already been mentioned that options are being considered, but I wanted to add my support for, at the very least, an option to disable the functionality entirely. I have no need currently to improve share performance. If it could be toggled at the share level, that could be nice, but the ability to globally disable it is all I would really need for now, just so I could upgrade without worrying about immediately breaking something that's been working for years.
  22. Agreed, no way I would use the cache pool as is, given the errors and prior partition status. I've moved shares off the cache. There were some mover issues with Plex (seems like a known issue with broken symlinks). I've set all shares to cache : no, so the only thing left on the pool is the orphaned Plex files, which I haven't had time to deal with. Without any of the shares using the cache, I don't think I'm at risk any more, except for maybe Plex breaking, but I can just reinstall that if needed. Not sure if I want to trust the drive going forward, but at least now I can take my time (going out of town soon).
  23. OK, new (confusing) update: After another reboot (because log file filled up from mover errors), here's the relevant output of lsblk -b: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sdd 8:48 0 480103981056 0 disk └─sdd1 8:49 0 480103948288 0 part /mnt/cache Somehow it fixed itself? No more of the BTRFS erros in the log yet. I certainly don't trust the filesystem on the drive to be in a good state considering all the previous errors, so I'll continue clearing off the cache in order to reinitialize it.
  24. What do you think the chances are the drive will be totally fine? I might just set all the shares to cache yes to clear things off, then to cache no once it's cleared (until I have more time to deal with this). If I had thought of doing that earlier, I think I wouldn't have posted right away (hard not to panic a bit when errors like that show up). So assuming I'm able to move everything without issue, I'll probably mark this solved for now and follow up once I'm better able to address it.
  25. I also forgot that I just installed the My Servers plugin (before the reboot). But I was looking at the log because the flash drive didn't seem to backup immediately (it's fine now). I did not see those errors occurring, and I'm pretty sure I had dockers running at the time that would have triggered the drive issue if it existed before the reboot.