sonofdbn

Members
  • Content Count

    329
  • Joined

  • Last visited

Posts posted by sonofdbn

  1. 25 minutes ago, dvd.collector said:

     

    EDIT, saw this on the PIA website:
    * The information contained within this article will only work with our Current/Previous Generation servers. Our Next Generation servers do not currently offer port-forwarding outside of the application. *

     

    So I guess as they upgrade their servers this will stop working.  Back to my other question then, can anyone recommend a different VPN that will work well with this docker?

    Thanks for pointing out how port-forwarding will be done by PIA in future. I too will be in the same boat - and looking for suggestions for an alternative VPN.

  2. So in the end I changed the BTRFS cache pool to a single SSD (also BTRFS), re-created everything and was fine for a few months. Unfortunately today I got error messages from the Fix Common Problems plug-in: A) unable to write to cache and B) unable to write to Docker Image. I'm assuming that B is a consequence of A, but anyway I've attached diagnostics.

     

    Looking at the GUI, Docker is 34% full and the 1 TB cache drive, a SanDisk SSD, has about 20% free space.

     

    But looking at the log for the cache drive, I get a large repeating list of entries like this:

    Jul 6 18:36:57 Tower kernel: BTRFS error (device sdd1): parent transid verify failed on 432263856128 wanted 2473752 found 2472968
    Jul 6 18:36:57 Tower kernel: BTRFS info (device sdd1): no csum found for inode 39100 start 3954470912
    Jul 6 18:36:57 Tower kernel: BTRFS warning (device sdd1): csum failed root 5 ino 39100 off 3954470912 csum 0x86885e78 expected csum 0x00000000 mirror 1
    Jul 6 18:36:58 Tower kernel: BTRFS error (device sdd1): parent transid verify failed on 432263856128 wanted 2473752 found 2472968
    Jul 6 18:36:58 Tower kernel: BTRFS info (device sdd1): no csum found for inode 39100 start 3954470912
    Jul 6 18:36:58 Tower kernel: BTRFS warning (device sdd1): csum failed root 5 ino 39100 off 3954470912 csum 0x86885e78 expected csum 0x00000000 mirror 1
    Jul 6 18:36:59 Tower kernel: BTRFS error (device sdd1): parent transid verify failed on 432263856128 wanted 2473752 found 2472968
    Jul 6 18:36:59 Tower kernel: BTRFS info (device sdd1): no csum found for inode 39100 start 3954470912
    Jul 6 18:36:59 Tower kernel: BTRFS warning (device sdd1): csum failed root 5 ino 39100 off 3954470912 csum 0x86885e78 expected csum 0x00000000 mirror 1
    Jul 6 18:36:59 Tower kernel: BTRFS error (device sdd1): parent transid verify failed on 432263856128 wanted 2473752 found 2472968
    Jul 6 18:36:59 Tower kernel: BTRFS info (device sdd1): no csum found for inode 39100 start 3954470912
    Jul 6 18:36:59 Tower kernel: BTRFS warning (device sdd1): csum failed root 5 ino 39100 off 3954470912 csum 0x86885e78 expected csum 0x00000000 mirror 1
    Jul 6 18:37:00 Tower kernel: BTRFS error (device sdd1): parent transid verify failed on 432263856128 wanted 2473752 found 2472968

    Should I replace the SSD or is there something I can do with BTRFS to try to fix any errors?

    tower-diagnostics-20200706-1928.zip

  3. My APC UPS died a while ago and I'm only now getting round to replacing it. Probably it's just the battery, but there's not much information available from APC. I used to have my unRAID box, a Synology 4-bay box and my Win 10 PC running off it. In truth, it might have been too much for the APC (650VA, 390W). But since I have to replace something anyway, I was wondering whether it's better to have one UPS per device or one big(ger) UPS. I only need the UPS to enable a clean shutdown; our power doesn't go out often, but a few times a year it goes out when there's lightning.

     

    I'm thinking that one UPS per device is probably the way to go because:

    - it's cheaper, or at least in the same ballpark as having one big device

    - not risking a single point of failure

    - that big device will probably weigh a ton

    - easier to control the shutdowns (no need for software to link to a master device)

    - takes up a bit more space, but power-cable wise less messy when devices are not close to each other, as in my case.

     

    Any thoughts about this?

  4. Yes, it's a downloads share for torrents. I did try using cache-prefer, but then of course some files did, correctly, go to the array. But I didn't like keeping the array disk spinning for reads. What I'd like to do is download to my unassigned device (SSD) and then manually move things I want to seed longer back to the cache drive. But I can't find any way of doing this in the docker I use (rtorrentvpn).

  5. I'm trying to look at what's going wrong on my server, and previously enabled the local syslog server, writing to my cache disk as outlined here:

    https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-781601. (It's in the FAQ for unRAID v6 topic.)

     

    The problem is that my cache drive might be part of the problem, so I'd like to avoid writing to it. Is there a way of writing the local syslog folder to an unassigned device? I have an unassigned SSD which I could use. When I try to select a local syslog folder through the GUI, I only get a choice of shared folders.

     

     

  6. Now looking at Fix Problems, I see errors: Unable to write to cache (Drive mounted read-only or completely full.) and Unable to write to Docker Image (Docker Image either full or corrupted.).

     

    According to Dashboard, doesn't look like cache is full (90% utilisation - about 100 GB fee). This is what I get (now) when I click Container Size on the Docker page in the GUI.

     

    Name                              Container     Writable          Log
    ---------------------------------------------------------------------
    calibre                             1.48 GB       366 MB      64.4 kB
    binhex-rtorrentvpn                  1.09 GB         -1 B       151 kB
    plex                                 723 MB       301 MB      6.26 kB
    CrashPlanPRO                         454 MB         -1 B      44.9 kB
    nextcloud                            354 MB         -1 B      4.89 kB
    mariadb                              351 MB         -1 B      9.62 kB
    pihole                               289 MB         -1 B      10.2 kB
    letsencrypt                          281 MB         -1 B      10.1 kB
    QDirStat                             210 MB         -1 B      19.4 kB
    duckdns                             20.4 MB      9.09 kB      5.18 kB

     

  7. I've re-created the docker image and all seemed fine. But this morning the log usage on the Dashboard jumped from 1% to 30% when I refreshed the browser. I did reinstall Plex yesterday, and prior to that the log was at 1% of memory (I have 31.4 GiB of usable RAM). Unfortunately it seems that the Dashboard doesn't necessarily update the log until you refresh the browser, so it's possible that the log size was higher than 1% earlier.

     

    Is Log size on the Dashboard just the syslog or also docker log? Because docker log is at 36MB, syslog at only around 1MB.

     

    Diagnostics are attached.

    tower-diagnostics-20200427-0949.zip

  8. OK, I can do that, but I already re-created the docker image when I reinstalled the dockers on the re-formatted cache drive. Any way of reducing the chance of this corruption happening again? Or could it be that there's some problem with the appdata files that I backed up and used to reinstall the dockers that is causing this?

  9. Sorry to be back again, but more problems.

     

    So I backed up what I could, then reformatted the cache drives and setup the same cache pool, then reinstalled most of the dockers and VMs. It was a pleasant surprise to find that just about everything that had been recovered was fine, including the VMs. As far as I could tell, nothing major was missing.

     

    Anyway, the server trundled along fine for a few days, then today the torrenting seemed a bit slow, so I looked at the dashboard and found that the log was at 100%. So I stopped my sole running VM and then tried to stop some dockers but found I was unable to; they seemed to restart automatically. In the end I used the GUI to stop the Docker service, then tried to stop the array (not shutdown), but the disks couldn't be unmounted. I got the message: Array Stopping - Retry unmounting user share(s) and nothing else happened after that. I grabbed the diagnostics and in the end I used powerdown from the console and shut down the array.

     

    From what I can see in the diagnostics, looks like there are a lot of BTRFS errors. So not sure what I should do at this point. Array is still powered down.tower-diagnostics-20200424-1108.zip

  10. Command seems to be OK. Am now happily copying files off /x to the array. I swear that some files that couldn't be copied before are now copying across at a reasonable - even very good - speed. A few folders seem to be missing entirely but everything I've tried so far has copied across with no problem. I'm hopeful that most of the files will be recovered.

     

    Thanks again for all the help.

  11. On 4/14/2020 at 11:05 PM, johnnie.black said:

    If you run a scrub it will identify all corrupt files, so any files not on that list will be OK.

    I changed a SATA cable on one of the cache drives in case that was a source of the weird online/offline access. Then I started the array with cache drives unassigned, and mounted the cache drives again at /x. Ran

     btrfs dev stats /x

    and got 0 errors (two drives in the pool):

    [/dev/sdd1].write_io_errs    0
    [/dev/sdd1].read_io_errs     0
    [/dev/sdd1].flush_io_errs    0
    [/dev/sdd1].corruption_errs  0
    [/dev/sdd1].generation_errs  0
    [/dev/sdb1].write_io_errs    0
    [/dev/sdb1].read_io_errs     0
    [/dev/sdb1].flush_io_errs    0
    [/dev/sdb1].corruption_errs  0
    [/dev/sdb1].generation_errs  0

    So time to scrub. I started the scrub, waited patiently for over an hour, then checked status, and found that it had aborted.

    Scrub started:    Thu Apr 16 19:15:12 2020
    Status:           aborted
    Duration:         0:00:00
    Total to scrub:   1.79TiB
    Rate:             0.00B/s
    Error summary:    no errors found

    Basically it had aborted immediately without any error message. (I know I shouldn't really complain, but Linux is sometimes not too worried about giving feedback.) Thought this might be because I had mounted the drives as read-only, so remounted as normal read-write and scrubbed again. Waited a while, and the status looked good:

    Scrub started:    Thu Apr 16 21:01:53 2020
    Status:           running
    Duration:         0:18:06
    Time left:        1:07:25
    ETA:              Thu Apr 16 22:27:27 2020
    Total to scrub:   1.79TiB
    Bytes scrubbed:   388.31GiB
    Rate:             366.15MiB/s
    Error summary:    no errors found

    Waited patiently until I couldn't resist checking and found this:

    Scrub started:    Thu Apr 16 21:01:53 2020
    Status:           aborted
    Duration:         1:04:36
    Total to scrub:   1.79TiB
    Rate:             262.46MiB/s
    Error summary:    no errors found

    Again no message that the scrub had aborted, had to run scrub status to see that it had aborted. Ran btrfs dev stats and again got no errors. Maybe this is an edge case, given that the drive is apparently full?

     

    So is there anything else worth trying? I'm not expecting to recover everything, but was hoping to avoid having to re-create some of the VMs. What if I deleted some files (if I can) to clear space and then tried scrub again?

  12. On 4/14/2020 at 12:54 AM, sonofdbn said:

    Unfortunately no joy. I went to the link you provided, and tried the first two approaches. My cache drives (btrfs pool) are sdd and sdb.

     

    1) Mount filesystem read only (non-destructive)

    I created  mount point x and then tried

    
    mount -o usebackuproot,ro /dev/sdd1 /x

    This gave me an error

    
    mount: /x: can't read superblock on /dev/sdd1.

    (Same result if I tried sdb1.) Then I tried

    
    mount -o ro,notreelog,nologreplay /dev/sdd1 /x

    This produced the same error.

     

    So I moved to

    2) BTRFS restore (non-destructive)

     

    I created the directory /mnt/disk4/restore. Then entered

    
    btrfs restore -v /dev/sdd1 /mnt/disk4/restore

    After a few seconds I got this error message:

    
    /dev/sdd1 is currently mounted.  Aborting.

    This looks odd (in that the disk is mounted and therefore presumably accessible), so I thought I should check whether I've missed anything so far.

    In trying to do the btrfs restore, I realised that it might not be surprising that "1) Mount filesystem read only (non-destructive) " above didn't work, because the disk is already mounted. I haven't had a problem actually accessing the disk. And it's then not surprising to see the last error message above.

     

    So my problem was how to unmount the cache drives to try 1) again. Not sure if this is the best way, but I simply stopped the array and then tried 1) again. Now I have access to the cache drive at my /x mountpoint, at least in the console. But I was a bit stuck trying to use it in any practical way. I thought about starting up the array again so that I could copy the cache files to an array drive, but wasn't sure if the cache drive could be mounted both "normally" for unRAID and at mountpoint /x.

     

    In any case, I had earlier used mc to try to copy files from the cache drive to the array, and that hadn't worked. So I've now turned to WinSCP and am copying files from mountpoint /x to a local drive. The great thing is that it can happily ignore errors and continue and it writes to a log. (No doubt there's some Linux way of doing this, but I didn't spend time looking.) Now I swear that some /appdata folders that generated errors when I tried copying earlier are now copying just fine, with no problems. Or perhaps the problem files are just not there any more ☹️, WinSCP can be very slow, but I think it's a result of the online/offline problem that I had with some files, and at least it keeps chugging away without horrible flashing which Teracopy did.

     

    But to my earlier point, can I start the array again, say in safe mode with GUI? I'd like to read some files off it. What would happen to the cache drive at mountpoint /x?

     

     

     

  13. 23 minutes ago, trurl said:

    Cache-only shares will go beyond the mimimum. Or writes directly to cache will go beyond minimum. The purpose is to allow cache-prefer or cache-yes shares to overflow to the array so they don't go beyond minimum accidentally.

     

    Thanks, that's useful information. Got to re-think my setup when I eventually sort out the cache disk.

  14. 6 minutes ago, trurl said:

    One thing to consider is that mover can't move open files (such as seeds). One of the reason I send torrents directly to the array. I have them on a share that only includes a single disk.

     

    But if you want to put torrents on cache, you should use a cache-prefer share so they can overflow to the array after cache gets less than Cache Minimum Free (Global Share Settings)

    I don't use mover at all (so not really using cache disk as a cache). I move files manually. But the cache-prefer share idea is excellent. I only "discovered" cache preferences a short time ago and didn't have a good idea of how they could be used.

     

    I didn't even know there was a Cache Minimum Free setting - but what happens when you hit the minimum free (ignoring for this discussion any cache-prefer shares)? Does this trigger a warning (and continue writing into the "buffer") or does it just act like a hard limit to the cache drive size?

  15. 48 minutes ago, johnnie.black said:

    Like suspected there are checksum errors, btrfs gives an i/o error when corrupted data is detected (and can't be fixed) so you know there's a problem.

     

    You can use btrfs restore (with the pool unmounted) to copy that data since it will ignore any checksum errors, but the data will still be corrupt.

    Can you give more details on btrfs restore? If the pool is unmounted, how do I access the drives? Or do I unmount the drives and then mount them individually as - I don't know - unassigned devices?

  16. 6 minutes ago, trurl said:

    My guess.

     

    You had a number of shares set cache-only for some reason (though some of those have files on the array). Do you really need all of those to stay on cache?

     

    No idea what you are using those for since their names have been anonymized. You have a pretty large cache so it should be pretty hard to fill it up. Are you trying to seed torrents from it or something? I always send torrents directly to the array.

    Cache-only shares are a bit messed up because I set some shares up before I understood how the cache could be used. In reality, I have /appdata, /domains and /isos there, as well as torrents.

     

    And, yes, torrents are also seeded from cache (didn't want to keep an array drive spinning). So that's probably the cause? I thought I left myself a reasonable margin (50GB) but perhaps I wasn't paying attention. I also left too many seeding on the cache because the latest versions of unRAID unfortunately slowed down file transfers from cache to array, so I didn't do transfers out as often as I used to.

     

    Bottom line, though, is that a full BTRFS cache drive pool can be a pretty bad problem.? Is there any notification that I could have enabled? My experience is Windows, and there I usually get a disk is low on space message.

  17. So it sounds like I've pretty much lost the data on the cache drive, although I might be lucky with some files (and it would take forever to work out which files are OK).

     

    That being the case, I think I should just try to recreate the cache drive from scratch. A real pain, but doable. That being the case, is there anything wrong with the drives themselves?

     

    And do you think it was the corruption that led to the cache drive being full or the other way round? Because if it was the other way round, I need to monitor what goes on in the cache drive more carefully in future. If the corruption was the issue, any idea what caused it?