trurl Posted April 14, 2020 Share Posted April 14, 2020 Note that just because you set minimum to larger than the largest file you expect to write doesn't mean that space will never be used. For example, say you set minimum free to 10GB. A disk has 20GB free. You write a 15GB file. It could choose another disk depending on other factors such as allocation method and split level, but it is allowed to choose that disk because it has more than minimum. If it does choose the disk, it will write that 15GB file to it. After that, the disk will have 5GB free, which is now less than minimum. Unraid will not choose the disk again until it has more than minimum again. Quote Link to comment
trurl Posted April 14, 2020 Share Posted April 14, 2020 Or even another example, one where you don't try to write a file larger than minimum. Minimum is set to 10GB as before. The disk has 11GB free. You write a 9GB file. The disk can be chosen, and after it will only have 2GB free. Quote Link to comment
sonofdbn Posted April 15, 2020 Author Share Posted April 15, 2020 OK, so the "buffer" is actually used. I swear that at one time I knew all of this 🙂 Quote Link to comment
sonofdbn Posted April 15, 2020 Author Share Posted April 15, 2020 On 4/14/2020 at 12:54 AM, sonofdbn said: Unfortunately no joy. I went to the link you provided, and tried the first two approaches. My cache drives (btrfs pool) are sdd and sdb. 1) Mount filesystem read only (non-destructive) I created mount point x and then tried mount -o usebackuproot,ro /dev/sdd1 /x This gave me an error mount: /x: can't read superblock on /dev/sdd1. (Same result if I tried sdb1.) Then I tried mount -o ro,notreelog,nologreplay /dev/sdd1 /x This produced the same error. So I moved to 2) BTRFS restore (non-destructive) I created the directory /mnt/disk4/restore. Then entered btrfs restore -v /dev/sdd1 /mnt/disk4/restore After a few seconds I got this error message: /dev/sdd1 is currently mounted. Aborting. This looks odd (in that the disk is mounted and therefore presumably accessible), so I thought I should check whether I've missed anything so far. In trying to do the btrfs restore, I realised that it might not be surprising that "1) Mount filesystem read only (non-destructive) " above didn't work, because the disk is already mounted. I haven't had a problem actually accessing the disk. And it's then not surprising to see the last error message above. So my problem was how to unmount the cache drives to try 1) again. Not sure if this is the best way, but I simply stopped the array and then tried 1) again. Now I have access to the cache drive at my /x mountpoint, at least in the console. But I was a bit stuck trying to use it in any practical way. I thought about starting up the array again so that I could copy the cache files to an array drive, but wasn't sure if the cache drive could be mounted both "normally" for unRAID and at mountpoint /x. In any case, I had earlier used mc to try to copy files from the cache drive to the array, and that hadn't worked. So I've now turned to WinSCP and am copying files from mountpoint /x to a local drive. The great thing is that it can happily ignore errors and continue and it writes to a log. (No doubt there's some Linux way of doing this, but I didn't spend time looking.) Now I swear that some /appdata folders that generated errors when I tried copying earlier are now copying just fine, with no problems. Or perhaps the problem files are just not there any more ☹️, WinSCP can be very slow, but I think it's a result of the online/offline problem that I had with some files, and at least it keeps chugging away without horrible flashing which Teracopy did. But to my earlier point, can I start the array again, say in safe mode with GUI? I'd like to read some files off it. What would happen to the cache drive at mountpoint /x? Quote Link to comment
JorgeB Posted April 15, 2020 Share Posted April 15, 2020 It should be fine to start the array, cache should appear as unmountable, but to be safer just unassign the cache devices before starting. Quote Link to comment
sonofdbn Posted April 16, 2020 Author Share Posted April 16, 2020 On 4/14/2020 at 11:05 PM, johnnie.black said: If you run a scrub it will identify all corrupt files, so any files not on that list will be OK. I changed a SATA cable on one of the cache drives in case that was a source of the weird online/offline access. Then I started the array with cache drives unassigned, and mounted the cache drives again at /x. Ran btrfs dev stats /x and got 0 errors (two drives in the pool): [/dev/sdd1].write_io_errs 0 [/dev/sdd1].read_io_errs 0 [/dev/sdd1].flush_io_errs 0 [/dev/sdd1].corruption_errs 0 [/dev/sdd1].generation_errs 0 [/dev/sdb1].write_io_errs 0 [/dev/sdb1].read_io_errs 0 [/dev/sdb1].flush_io_errs 0 [/dev/sdb1].corruption_errs 0 [/dev/sdb1].generation_errs 0 So time to scrub. I started the scrub, waited patiently for over an hour, then checked status, and found that it had aborted. Scrub started: Thu Apr 16 19:15:12 2020 Status: aborted Duration: 0:00:00 Total to scrub: 1.79TiB Rate: 0.00B/s Error summary: no errors found Basically it had aborted immediately without any error message. (I know I shouldn't really complain, but Linux is sometimes not too worried about giving feedback.) Thought this might be because I had mounted the drives as read-only, so remounted as normal read-write and scrubbed again. Waited a while, and the status looked good: Scrub started: Thu Apr 16 21:01:53 2020 Status: running Duration: 0:18:06 Time left: 1:07:25 ETA: Thu Apr 16 22:27:27 2020 Total to scrub: 1.79TiB Bytes scrubbed: 388.31GiB Rate: 366.15MiB/s Error summary: no errors found Waited patiently until I couldn't resist checking and found this: Scrub started: Thu Apr 16 21:01:53 2020 Status: aborted Duration: 1:04:36 Total to scrub: 1.79TiB Rate: 262.46MiB/s Error summary: no errors found Again no message that the scrub had aborted, had to run scrub status to see that it had aborted. Ran btrfs dev stats and again got no errors. Maybe this is an edge case, given that the drive is apparently full? So is there anything else worth trying? I'm not expecting to recover everything, but was hoping to avoid having to re-create some of the VMs. What if I deleted some files (if I can) to clear space and then tried scrub again? Quote Link to comment
JorgeB Posted April 16, 2020 Share Posted April 16, 2020 If the scrub isn't working best bet it's to move the data, and all data successfully moved can be considered OK, any files that give you i/o error during the transfer are corrupt, you can get them with btrfs restore, but like mentioned before they will still be corrupt. Quote Link to comment
sonofdbn Posted April 16, 2020 Author Share Posted April 16, 2020 Thanks for all the help so far. Given that my cache pool had two drives, sdd and sdb, is this the correct command to mount them? mount -o usebackuproot /dev/sdd1 /x Quote Link to comment
sonofdbn Posted April 17, 2020 Author Share Posted April 17, 2020 Command seems to be OK. Am now happily copying files off /x to the array. I swear that some files that couldn't be copied before are now copying across at a reasonable - even very good - speed. A few folders seem to be missing entirely but everything I've tried so far has copied across with no problem. I'm hopeful that most of the files will be recovered. Thanks again for all the help. Quote Link to comment
JorgeB Posted April 17, 2020 Share Posted April 17, 2020 On 4/16/2020 at 6:19 PM, sonofdbn said: Thanks for all the help so far. Given that my cache pool had two drives, sdd and sdb, is this the correct command to mount them? Sorry, missed this one, but yes, you can use either one, it's in the FAQ. Quote Link to comment
sonofdbn Posted April 24, 2020 Author Share Posted April 24, 2020 Sorry to be back again, but more problems. So I backed up what I could, then reformatted the cache drives and setup the same cache pool, then reinstalled most of the dockers and VMs. It was a pleasant surprise to find that just about everything that had been recovered was fine, including the VMs. As far as I could tell, nothing major was missing. Anyway, the server trundled along fine for a few days, then today the torrenting seemed a bit slow, so I looked at the dashboard and found that the log was at 100%. So I stopped my sole running VM and then tried to stop some dockers but found I was unable to; they seemed to restart automatically. In the end I used the GUI to stop the Docker service, then tried to stop the array (not shutdown), but the disks couldn't be unmounted. I got the message: Array Stopping - Retry unmounting user share(s) and nothing else happened after that. I grabbed the diagnostics and in the end I used powerdown from the console and shut down the array. From what I can see in the diagnostics, looks like there are a lot of BTRFS errors. So not sure what I should do at this point. Array is still powered down.tower-diagnostics-20200424-1108.zip Quote Link to comment
JorgeB Posted April 24, 2020 Share Posted April 24, 2020 Docker image is corrupt, delete and re-create, see docker FAQ if needed, cache filesystem seems OK for what I can see. Quote Link to comment
sonofdbn Posted April 24, 2020 Author Share Posted April 24, 2020 OK, I can do that, but I already re-created the docker image when I reinstalled the dockers on the re-formatted cache drive. Any way of reducing the chance of this corruption happening again? Or could it be that there's some problem with the appdata files that I backed up and used to reinstall the dockers that is causing this? Quote Link to comment
trurl Posted April 24, 2020 Share Posted April 24, 2020 9 minutes ago, sonofdbn said: OK, I can do that, but I already re-created the docker image when I reinstalled the dockers on the re-formatted cache drive. Any way of reducing the chance of this corruption happening again? Or could it be that there's some problem with the appdata files that I backed up and used to reinstall the dockers that is causing this? Docker image doesn't look full in those diagnostics now, but are you sure you didn't fill it? Since the appdata for a container contains the settings for that application, such as paths, it is certainly possible that the appdata contained incorrect settings, or maybe was missing some settings, that would cause you to fill docker image. Did you check the settings for the applications before you used them again? Quote Link to comment
sonofdbn Posted April 24, 2020 Author Share Posted April 24, 2020 Can't say I did check the settings. But I'm sure the docker image wasn't anywhere near full when I looked at the dashboard. I remember thinking that there was significantly more free space since I'd left out a number of dockers when I did the reinstall. Quote Link to comment
sonofdbn Posted April 27, 2020 Author Share Posted April 27, 2020 I've re-created the docker image and all seemed fine. But this morning the log usage on the Dashboard jumped from 1% to 30% when I refreshed the browser. I did reinstall Plex yesterday, and prior to that the log was at 1% of memory (I have 31.4 GiB of usable RAM). Unfortunately it seems that the Dashboard doesn't necessarily update the log until you refresh the browser, so it's possible that the log size was higher than 1% earlier. Is Log size on the Dashboard just the syslog or also docker log? Because docker log is at 36MB, syslog at only around 1MB. Diagnostics are attached. tower-diagnostics-20200427-0949.zip Quote Link to comment
sonofdbn Posted April 27, 2020 Author Share Posted April 27, 2020 Took a quick look at the logs in the above diagnostics and they seem to have omitted docker.log.1. So I've attached it here after editing out many similar lines. (Think file size was too big to upload.) Plex container ID is 8138acb243f3 Pihole container ID is 358107cb7f64 These two seem to come up in the logs. docker.log.1.txt Quote Link to comment
sonofdbn Posted April 27, 2020 Author Share Posted April 27, 2020 Now looking at Fix Problems, I see errors: Unable to write to cache (Drive mounted read-only or completely full.) and Unable to write to Docker Image (Docker Image either full or corrupted.). According to Dashboard, doesn't look like cache is full (90% utilisation - about 100 GB fee). This is what I get (now) when I click Container Size on the Docker page in the GUI. Name Container Writable Log --------------------------------------------------------------------- calibre 1.48 GB 366 MB 64.4 kB binhex-rtorrentvpn 1.09 GB -1 B 151 kB plex 723 MB 301 MB 6.26 kB CrashPlanPRO 454 MB -1 B 44.9 kB nextcloud 354 MB -1 B 4.89 kB mariadb 351 MB -1 B 9.62 kB pihole 289 MB -1 B 10.2 kB letsencrypt 281 MB -1 B 10.1 kB QDirStat 210 MB -1 B 19.4 kB duckdns 20.4 MB 9.09 kB 5.18 kB Quote Link to comment
JorgeB Posted April 27, 2020 Share Posted April 27, 2020 Cache filesystem is corrupt again, if it's happening multiple times without an apparent reason you likely have some hardware issue. Quote Link to comment
trurl Posted April 27, 2020 Share Posted April 27, 2020 8 hours ago, sonofdbn said: doesn't look like cache is full (90% utilisation - about 100 GB fee). No but considering the size of cache, that is a lot of data on cache. Any idea what is taking all that space? Except for the usual "system" shares looks like you have one starting with 'd' that is cache-only. I am guessing this is a downloads share. Does it really need to be cache-only? I can understand wanting to post-process on cache, but you might consider making this cache-yes so it can at least overflow and so anything that sits there too long will get moved to the array. On the other hand, most of your array is mostly full so you might want to consider getting more capacity there also. Quote Link to comment
sonofdbn Posted April 27, 2020 Author Share Posted April 27, 2020 Yes, it's a downloads share for torrents. I did try using cache-prefer, but then of course some files did, correctly, go to the array. But I didn't like keeping the array disk spinning for reads. What I'd like to do is download to my unassigned device (SSD) and then manually move things I want to seed longer back to the cache drive. But I can't find any way of doing this in the docker I use (rtorrentvpn). Quote Link to comment
trurl Posted April 27, 2020 Share Posted April 27, 2020 Why not just download to the UD and leave them seeding from the UD? Quote Link to comment
sonofdbn Posted April 28, 2020 Author Share Posted April 28, 2020 The UD is 500GB, and it's likely to be too small, especially taking into account other stuff I want to put on it. Quote Link to comment
sonofdbn Posted July 6, 2020 Author Share Posted July 6, 2020 So in the end I changed the BTRFS cache pool to a single SSD (also BTRFS), re-created everything and was fine for a few months. Unfortunately today I got error messages from the Fix Common Problems plug-in: A) unable to write to cache and B) unable to write to Docker Image. I'm assuming that B is a consequence of A, but anyway I've attached diagnostics. Looking at the GUI, Docker is 34% full and the 1 TB cache drive, a SanDisk SSD, has about 20% free space. But looking at the log for the cache drive, I get a large repeating list of entries like this: Jul 6 18:36:57 Tower kernel: BTRFS error (device sdd1): parent transid verify failed on 432263856128 wanted 2473752 found 2472968 Jul 6 18:36:57 Tower kernel: BTRFS info (device sdd1): no csum found for inode 39100 start 3954470912 Jul 6 18:36:57 Tower kernel: BTRFS warning (device sdd1): csum failed root 5 ino 39100 off 3954470912 csum 0x86885e78 expected csum 0x00000000 mirror 1 Jul 6 18:36:58 Tower kernel: BTRFS error (device sdd1): parent transid verify failed on 432263856128 wanted 2473752 found 2472968 Jul 6 18:36:58 Tower kernel: BTRFS info (device sdd1): no csum found for inode 39100 start 3954470912 Jul 6 18:36:58 Tower kernel: BTRFS warning (device sdd1): csum failed root 5 ino 39100 off 3954470912 csum 0x86885e78 expected csum 0x00000000 mirror 1 Jul 6 18:36:59 Tower kernel: BTRFS error (device sdd1): parent transid verify failed on 432263856128 wanted 2473752 found 2472968 Jul 6 18:36:59 Tower kernel: BTRFS info (device sdd1): no csum found for inode 39100 start 3954470912 Jul 6 18:36:59 Tower kernel: BTRFS warning (device sdd1): csum failed root 5 ino 39100 off 3954470912 csum 0x86885e78 expected csum 0x00000000 mirror 1 Jul 6 18:36:59 Tower kernel: BTRFS error (device sdd1): parent transid verify failed on 432263856128 wanted 2473752 found 2472968 Jul 6 18:36:59 Tower kernel: BTRFS info (device sdd1): no csum found for inode 39100 start 3954470912 Jul 6 18:36:59 Tower kernel: BTRFS warning (device sdd1): csum failed root 5 ino 39100 off 3954470912 csum 0x86885e78 expected csum 0x00000000 mirror 1 Jul 6 18:37:00 Tower kernel: BTRFS error (device sdd1): parent transid verify failed on 432263856128 wanted 2473752 found 2472968 Should I replace the SSD or is there something I can do with BTRFS to try to fix any errors? tower-diagnostics-20200706-1928.zip Quote Link to comment
JorgeB Posted July 6, 2020 Share Posted July 6, 2020 Checksum errors suggest a hardware problem, like bad RAM. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.