Jump to content

bsim

Members
  • Posts

    191
  • Joined

  • Last visited

Posts posted by bsim

  1. Scratching my head on this one...unraid fluke?

     

    Dual parity 12TB, 6.12.6, all XFS drives, monthly checks, 0 errors on last parity check.

     

    1. I precleared a 5TB that I've had as a hot spare for my array, 0 errors, shows in unassigned devices.

    2. Removed Smart Failed/But Still Good 4TB from array (10 Uncorrectables)

    3. Replaced With Pre-cleared 5TB

    4. Started array with Rebuild, 0 Errors on rebuild

    5. Red X a different 5TB drive!?

         ****The drive does have several uncorrectables****

     

     

    Will a drive red X and emulate if it has too many uncorrectables? And not show any errors in the rebuild?

     

     

  2. I've read extensively on the forums and on google, but haven't found someone with my exact, relatively simple interests...

     

    I have multiple large external hard drives (22TBx) and I have extensive scripts (using rsync) that do automatic backups when powered on (unassigned devices script) every month or so.

     

    My question is, what drive format should I use for the external drives?

     

    1. ***I would love to turn my current backup (based-on-changed-files) to verify integrity of backed up files based on FS checksums that would refresh files if bitrot occurs.
    2. I'm not extremely worried about accessing the drives from outside machines (windows/ntfs), and definitely don't have any computational limitations.

     

    I'm thinking BTRFS and ZFS would be my best bet, but:

     

    1. Can rsync backups use the FS checksumming built into these file systems to determine differences? (better option than rsync?)
    2. Does FS checksumming in BTRFS/ZFS happen automatically in the background without complications (outside of the unassigned devices gui)?
    3. Is scriptable command line FS checking easy for BTRFS/ZFS? (currently using xfs_repair for XFS)

     

     

     

  3. I see the point for being careful with automatic parity corrections, but with how stable my system is hardware wise, it's worked for years flawlessly. Just every once in a while i get a burst of sync errors on a 140TB array, the number of errors don't seem like a major issue vs the potential problems automatic parity correction would save me from.

     

    I considered installing some type of indexing/checksum software to watch for any type of bit rot or actual corruption...just haven't got around to it.

     

    It would be awesome if there was a way to translate the location of incorrect bits to at least a controller/drive/file...would help greatly in my case. I don't see why the main unraid driver wouldn't be able to spit out the details of the parity issue when doing corrections, seems like it would be a great diagnostic tool.

  4. Are corrected parity sync errors truly corrected or will there be some sort of hidden corruption?

     

    1 hour ago, trurl said:

    not caused by some drive or other hardware issue.

     

    The errors are not recurring and I often go several months/checks with no errors detected. The hardware has been stable/unchanged for years now. If I can't determine the issue using smart, obvious unraid errors or any log files then why wouldn't a correcting parity check just save me time?

  5. Running the latest unraid pro 6.10.3 with dual parity using mirrored SSD's (240GB, mover nightly) as cache drives...

     

    I run a large array and run parity checks automatically every month. Most times I get no parity errors. But sometimes get a few thousand corrected parity errors.

     

    I have a ups that does a graceful shutdown (but i guess it's possible that the shutdown process takes longer than the ups power could hold out for waiting for unraid to shutdown the array). I do have power outages, but the UPS that can stand 20 minutes of run time before it tells unraid to issue a shut down.

     

    No drives have red balls or have any issues with SMART 5, 187, 188, 197 or 198 (Backblaze recommended)

     

    The physical server has not been moved/opened in several months.

     

    Two questions:

    1.) What files in the diagnostics download (saved immediately after sync errors) would show me what files/drives reported the sync errors?  What am I looking for in the files that would be able to tell me the details?

     

    2.) Do corrected sync parity errors (with dual parity) mean that the data was corrected and no corruption has occurred?

     

     

     

  6. While reviewing syslog, I found thousands of these repeating lines for several months now...no real problems with the server, but I can't find any type of explanation/others having the same problem anywhere to see if this is a client connection issue or a server side issue.

     

    Nov 1 03:23:11 UNRAID smbd[17025]: [2021/11/01 03:23:11.874565, 0] ../../source3/smbd/smbXsrv_client.c:663(smbXsrv_client_connection_pass_loop) Nov 1 03:23:11 UNRAID smbd[17025]: smbXsrv_client_connection_pass_loop: got connection sockfd[847]

     

    Anyone have any ideas?

  7. I would think that if the drive wasn't part of the array (abruptly disconnected), it would have just errored out...If I would have not started from basics to figure out why the drive was being wonky, I could have sworn it was a very specific hard drive defect. The fact that the utility was writing data directly to the /dev folder and it was being written as a file rather than a device is really screwy.

     

    All three drives are now pre-clearing without issue.

  8. Nevermind...It has to be a bug in the preclear plugin.

     

    by chance ssh'ed (WinSCP) to unraid, and found that the /dev/sdac was actually a 30GB file.

    Deleted the file, unplugged and replugged sata connection for that drive and VOILLA!

     

    My problem must be with a bug or unhandled case in the preclear script, previously, when I was preclearing the drives, I had unplugged one of them in mid preclear...and even after attempting to stop the preclear on that drive, it must have choked on something. After wiping out that /dev file, this came up in the preclear!

     

     

     

    Preclear Bug After Deleting sdac file and reset sata cable.jpg

    sdac error.jpg

  9. Racking my brain on this one...

    3 new 12TB drives (identical shucked drives) (they do not have disable pin)

    verified smart ok seperate windows machine BEFORE shuck (HD Tune Pro), ALL THREE OK

    verified smart ok seperate windows machine AFTER shuck (HD Tune Pro), ALL THREE OK

    wiped all partitions, ran read and write tests for all three, no problems

     

    attached 3 drives to server, preclear plugin identified 2 fine, 3rd drive does not show "identity", only "_" (Attached pics)

    switched sata connection with one of the working drives, 3rd drive still does not identify

    switched power connection, 3rd drive still does not identify

     

    from command line, attempted to manually pull smart (smartctl -H /dev/sdac)

    2 drives results passed, possible bad drive "INQUIRY failed"

     

    Is this just an oddly bad drive or is there some sort of cached drive data going wonky?

     

     

    Preclear Error 2.jpg

    Preclear Errror 1.jpg

  10. Thank you for the help...The df works, but I used "sudo du /mnt/user/system..." for each of the folders on the cache and found the full size docker image being recreated, it looks like the the docker image is not counted in the file size listing via smb/sftp, and somehow (me or automatic) the docker image initial size was set to 100GB (which explains the reboot, but not before the reboot). I disabled docker, deleted the image and everything is empty again.

     

    does docker.img have different permissions causing smb shares not to list it?

     

    For the future, is there an easy way to store the docker image/each docker data to the array instead of the cache drive?

     

    Thank you for your help!

  11. I'm running the latest 6.8.1 (problem existed in 6.8.0) and have a btrfs cache pool (mirrored identical 120GB SSDs) that initially showed 75GB used, I confirmed they are almost EMPTY through the SMB cache share and through SFTP...I first ran the mover script a few times, still no change, then thinking it could just be a glitch, I rebooted unraid, and it now shows 111GB USED...verifying through the SMB cache share, still only a few GB are used.

    Is this a BTRFS issue or an unraid issue?

    Are there files on the share that possibly would not show up in an SMB listing or an SFTP?

     

     

    320038620_UnraidUsedCacheSpaceBadMath.thumb.jpg.312097856ae6752c970baaf2507abaa7.jpg

     

    804985445_UnraidUsedCacheSpace.jpg.c79e77c656fa9ecb07cf116f2b941217.jpg

     

     

    1872905803_UnraidUsedCacheSpaceBadMath2.thumb.jpg.38b1ffc0cb65338f61ef959e0f9195cc.jpg listing?

  12. I saw this script, but with it only having been updated 5 years ago and a few major tunables added/removed (Inside and outside the gui) I didn't want to assume it would work on the latest version of unraid (potentially giving tunables that are no longer relevant)...Does anyone know if the script is still relevant/usable for the latest version of unraid?

×
×
  • Create New...