-C-

Members
  • Posts

    91
  • Joined

  • Last visited

Everything posted by -C-

  1. I had some troubles during the rebuild and had to restart the server, but the rebuild appeared to complete successfully: Any idea what could have caused this? This is the first time a disk died without warning and I've needed to use parity. I'm assuming the data from the failed drive's gone. Fortunately, I have most of it backed up. I am using syslog to save the logs, so should have logs of everything if they're of use.
  2. What about linking the banner to Tools > Update OS, at least as an interim solution?
  3. Certainly possible. The system wasn't happy when I rebooted it (which was the reason for the reboot) and it may have killed hung processes in order to reboot. It certainly took longer than usual. (I used powerdown -r to restart in case that makes any difference.)
  4. In which case I'm stuck in a loop, for now- I rebooted the first time it happened and everything was stable for a day or so before it happened again, without a reason I can find. What's painful is that the rebuild is happening slowly- when I could last access the GUI I was getting around 10-30 MB/s, so I'll likely be stuck without a GUI for another day at least. I've not had a disk fail without warning before, so not had to rebuild from parity like this and am not sure whether that's normal. It's certainly running a lot slower than a correcting check. That's what happened when I rebooted part way through the rebuild yesterday. Not sure if that's normal though. I've disabled mover as the rebuild was stopping for the daily move and not restarting afterwards.
  5. Just tried logging into the Unraid GUI and am now getting a Load is pegged again: top - 12:24:40 up 1 day, 12:32, 1 user, load average: 52.86, 52.47, 52.31 Tasks: 1152 total, 1 running, 1151 sleeping, 0 stopped, 0 zombie %Cpu(s): 2.9 us, 5.2 sy, 0.0 ni, 82.7 id, 9.1 wa, 0.0 hi, 0.1 si, 0.0 st MiB Mem : 31872.3 total, 6157.1 free, 12252.2 used, 13463.0 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 18476.1 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11805 root 20 0 975184 412108 516 S 93.7 1.3 158:06.42 /usr/local/bin/shfs /mnt/user -disks 31 -o default_permissions,allow_other,noatime -o remember=0 29298 nobody 20 0 226844 109888 32124 S 22.5 0.3 12:33.84 /usr/lib/plexmediaserver/Plex Media Server 12138 nobody 20 0 386324 70936 55404 S 6.0 0.2 0:00.28 php-fpm: pool www 9015 root 20 0 0 0 0 S 4.3 0.0 137:11.80 [unraidd0] 13271 nobody 20 0 386216 65896 50496 S 4.3 0.2 0:00.16 php-fpm: pool www 22798 nobody 20 0 386744 79348 63384 S 4.3 0.2 0:17.22 php-fpm: pool www 7495 root 20 0 0 0 0 D 1.0 0.0 26:23.74 [mdrecoveryd] Here's a list of installed plugins: root@Tower:~# ls /var/log/plugins/ Python3.plg@ dynamix.cache.dirs.plg@ dynamix.system.temp.plg@ open.files.plg@ unRAIDServer.plg@ zfs.master.plg@ appdata.backup.plg@ dynamix.file.integrity.plg@ dynamix.unraid.net.plg@ parity.check.tuning.plg@ unassigned.devices-plus.plg@ community.applications.plg@ dynamix.file.manager.plg@ file.activity.plg@ qnap-ec.plg@ unassigned.devices.plg@ disklocation-master.plg@ dynamix.s3.sleep.plg@ fix.common.problems.plg@ tips.and.tweaks.plg@ unbalance.plg@ dynamix.active.streams.plg@ dynamix.system.autofan.plg@ intel-gpu-top.plg@ unRAID6-Sanoid.plg@ user.scripts.plg@ I can access files on the array OK over the network, rebuild is still running, albeit very slowly: root@Tower:~# parity.check status Status: Parity Sync/Data Rebuild (65.2% completed) Any advice on what I can try to get the load back down?
  6. Load is still climbing: load average: 57.57, 57.49, 57.00 Looks like it could be related Docker: root@Tower:/mnt/user/system# umount /var/lib/docker umount: /var/lib/docker: target is busy. Parity rebuild seems to be going much slower than it should be, guess it's due to the high load. So I ran parity.check stop after doing so the GUI's now loading fine, but I'm getting a "Retry unmounting user share(s)" in the GUI footer. I tried a reboot but it's hung. Via SSH I tried stopping Docker service, but it doesn't seem to be that: root@Tower:/mnt/disks# umount /var/lib/docker umount: /var/lib/docker: not mounted. I left it (wasn't sure what else to try) and eventually it restarted and things seem to be back to normal. I've now discovered that the Parity Tuning plugin doesn't/ can't continue a Parity Sync/Data Rebuild in the same way that it can a correcting parity check, so it's back to the beginning with that. I'm going to avoid touching anything until the rebuild's finished.
  7. I checked the logs and found the crash happened around here: Sep 3 17:00:54 Tower webGUI: Successful login user root from 192.168.34.42 Sep 3 17:01:25 Tower php-fpm[7836]: [WARNING] [pool www] server reached max_children setting (50), consider raising it Doing some further digging on that error, I found this post: In which the poster found the issue was due to the GPU Statistics plugin. I had just installed that a couple of days ago, so it would seem that this is likely the cause of my problem too. I successfully removed the plugin via CLI with plugin remove gpustat.plg ...but after a few minutes the sys load remains high and still no GUI. Looking like a reboot's my only option, but Status: Parity Sync/Data Rebuild (65.3% completed)
  8. I was updating some docker containers through the Docker GUI page when the page froze. Checked top via SSH and got this: top - 17:58:46 up 1 day, 21:02, 1 user, load average: 53.92, 53.54, 53.07 Tasks: 1107 total, 3 running, 1104 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.9 us, 2.6 sy, 0.0 ni, 54.9 id, 41.4 wa, 0.0 hi, 0.2 si, 0.0 st MiB Mem : 31872.3 total, 5800.0 free, 11092.2 used, 14980.1 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 19566.8 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24398 root 20 0 34316 32632 1960 R 21.5 0.1 0:05.21 find 12074 root 20 0 974656 409920 532 S 10.9 1.3 215:55.56 shfs 18749 nobody 20 0 455592 102776 80756 S 5.3 0.3 0:02.04 php-fpm82 21905 nobody 20 0 386252 77244 61756 R 4.6 0.2 0:00.31 php-fpm82 18604 nobody 20 0 455788 105656 83604 S 4.0 0.3 0:02.86 php-fpm82 11495 root 0 -20 0 0 0 S 3.3 0.0 41:37.86 z_rd_int_0 11496 root 0 -20 0 0 0 S 3.3 0.0 41:39.55 z_rd_int_1 11497 root 0 -20 0 0 0 S 3.3 0.0 41:38.24 z_rd_int_2 7491 nobody 20 0 2711124 259320 24452 S 1.7 0.8 0:11.99 mariadbd Thing is, I'm part way through an array rebuild having replaced a failed HDD. Usually I would restart the server if the GUI becomes snafu, but in this case, is it safe to do so? (I have the Parity Check Tuning plugin installed) or is there a CLI command I can try to bring things back?
  9. Thanks for the super useful plugin. There's a typo on its settings page: Datasets Datasets Exclussion Patterns (Just One!): should be Datasets Datasets Exclusion Patterns (Just One!):
  10. I ran a large unBALANCE transfer and, typically, it all went without a hitch. Not sure why it took so long, but it completed successfully and when finished the Unraid GUI was still running fine. Will continue to monitor the situation and will post if things go awry again.
  11. Thanks for checking the diags- I have some more moving to do and will be sure to post the logs. Interesting. I've not been able to find another example of someone having the same issue, so somewhat reassuring to know I'm (possibly) not the only one. Then again, this was happening when I first started using Unraid last year (6.11.3). I didn't upgrade until the 6.12 releases came out and have kept up to date since then. Unfortunately the problem has persisted across all of the versions I've been on.
  12. I'm moving files around so I can start making use of ZFS. This issue has been going on for most of this year, but I've rarely needed to move large amounts of data around so haven't spent much time on troubleshooting. I started doing large moves using rsync via CLI and had problems, so moved to unBALANCE as it gives better visibility of what's going on. The crashing appears to be the same when I use either method. I find that the move completes correctly, but the main Unraid GUI becomes unreachable after a while. So for example, the whole move may take 5 hours, but the GUI becomes unreachable after 2 hours. If I try to connect with the "Array - Unraid Monitor" Android app while the GUI's borked, it displays a "Sever error" message. When the Unraid GUI is down, the unBALANCE GUI is not affected and still runs fine. I created the attached diagnostics while unBALANCE was running. It was a 172GB move, took nearly 50 mins and completed successfully. This is what top looked like while the GUI was crashed after an earlier, larger move. This was many hours after the move had finished. No dockers or VMs running: tower-diagnostics-20230820-1400.zip
  13. Thanks to whoever's responsible for the Nicotine+ container. I tried a few times over the years to get it running on Windows and never had any joy. It fired straight up on my Unraid 6.12.2 and all looks good. Couple of things I noticed:- Main thing is with the download folders specified in the template. I have them both set to reference subfolders within my /mnt/user/Audio/ share, but the files weren't being saved to those dirs. I eventually found that they were being saved, but in the container's config/.local/share/nicotine/downloads directory. I fixed this by going into the Nicotine+ prefs/ downloads and changing the Incomplete and completed to the container's dirs as specified by default in the template. I also had to go to prefs/ Shares and manually add the container's shares folder that was specified by default. On the template- both descriptions for the complete and incomplete downloads are the same. In Prefs, in User Interface the "Prefer dark mode" selection is being ignored.
  14. Plugin page description typo: "Currently know supported units are:" Should be "Currently known supported units are:"
  15. Couldn't find a more suitable topic- hope the relevant person sees this. On this page: https://unraid.net/services There is no British Standard Time timezone. It's either GMT/ Greenwich Mean Time (UTC), or BST/ British Summer Time (UTC+1) depending on whether we're in daylight savings or not.
  16. I have both of those running daily and although the PCT log entries stopped just after the mover started, the actual rebuild continued and completed seemingly successfully without any interaction on my part.
  17. Understood, thanks. (PS- I meant green thumbs, not dots)
  18. Thanks for this- there's no way I would've guessed to click on that warning. Good to see all green dots again.
  19. Thanks Dave- that makes things clearer. If only the standard messages were as descriptive as the Parity Check Tuning ones. I check in on my server most days and try to stay on top of app & plugin updates as soon as they become available. The Parity Check Tuning plugin is indeed on 2023.07.08 and I believe it was updated before I replaced the disk, but not certain. Good luck with finding the cause of the stopping monitoring task. In my case all seemed good until the daily mover operation started.
  20. I'm still not 100% sure about what's going on with all this 😜 Here's an update with what happened. I followed the guide to replace the failing disk. The rebuild onto the new disk appears to have gone well with no errors reported: and What's strange is that there's nothing in the logs at the 10:00 timestamp that the parity result shows as the rebuild end time: Jul 10 06:45:11 Tower emhttpd: spinning down /dev/sde Jul 10 09:15:08 Tower autofan: Highest disk temp is 43C, adjusting fan speed from: 230 (90% @ 833rpm) to: 205 (80% @ 854rpm) Jul 10 09:20:14 Tower autofan: Highest disk temp is 44C, adjusting fan speed from: 205 (80% @ 868rpm) to: 230 (90% @ 834rpm) Jul 10 09:39:17 Tower emhttpd: read SMART /dev/sdh Jul 10 09:59:53 Tower webGUI: Successful login user root from 192.168.34.42 Jul 10 10:00:43 Tower kernel: md: sync done. time=132325sec Jul 10 10:00:43 Tower kernel: md: recovery thread: exit status: 0 Jul 10 10:05:23 Tower autofan: Highest disk temp is 43C, adjusting fan speed from: 230 (90% @ 869rpm) to: 205 (80% @ 907rpm) Jul 10 10:09:42 Tower emhttpd: spinning down /dev/sdh Jul 10 10:14:57 Tower webGUI: Successful login user root from 192.168.34.42 Jul 10 10:15:29 Tower autofan: Highest disk temp is 42C, adjusting fan speed from: 205 (80% @ 869rpm) to: 180 (70% @ 854rpm) Jul 10 10:30:00 Tower webGUI: Successful login user root from 192.168.34.42 Jul 10 10:30:34 Tower autofan: Highest disk temp is 41C, adjusting fan speed from: 180 (70% @ 850rpm) to: 155 (60% @ 853rpm) Jul 10 10:30:44 Tower emhttpd: spinning down /dev/sdg I can see this in the log when the rebuild starts: Jul 8 21:17:28 Tower Parity Check Tuning: DEBUG: Parity Sync/Data Rebuild running Jul 8 21:17:28 Tower Parity Check Tuning: Parity Sync/Data Rebuild detected Jul 8 21:17:28 Tower Parity Check Tuning: DEBUG: Created cron entry for 6 minute interval monitoring Then I get the update every 6 minutes as expected: Jul 9 02:24:34 Tower Parity Check Tuning: DEBUG: Parity Sync/Data Rebuild running Jul 9 02:30:20 Tower Parity Check Tuning: DEBUG: Parity Sync/Data Rebuild running Jul 9 02:36:33 Tower Parity Check Tuning: DEBUG: Parity Sync/Data Rebuild running Until here: Jul 9 02:42:20 Tower Parity Check Tuning: DEBUG: Parity Sync/Data Rebuild running Jul 9 02:42:20 Tower Parity Check Tuning: DEBUG: detected that mdcmd had been called from sh with command mdcmd nocheck PAUSE Which happens a couple of minutes after this: Jul 9 02:40:01 Tower root: mover: started There are no further parity related entries after that. I'm not sure whether I can consider things OK now, or whether I should be investigating further.
  21. My issue with the 2 errors being found during parity check remains. I've now got a failing drive and have a new one to replace it with. I've successfully moved everything off the old drive. I had an unclean shutdown recently and Unraid came back up it ran an automatic correcting check which finished today and this is the result from the log: Jul 8 03:18:43 Tower Parity Check Tuning: DEBUG: Automatic Correcting Parity-Check running Jul 8 03:19:25 Tower kernel: md: recovery thread: P corrected, sector=39063584664 Jul 8 03:19:25 Tower kernel: md: recovery thread: P corrected, sector=39063584696 Jul 8 03:19:25 Tower kernel: md: sync done. time=1844sec Jul 8 03:19:25 Tower kernel: md: recovery thread: exit status: 0 The problem is with the same 2 sectors on parity P that have been coming up as bad since the middle of December, but not always: Both parity drives completed their SMART short self-tests without error. I'm unsure how best to proceed. As my largest data disk is 18TB, the parities are 20TB and these 2 problem sectors are right at the end of the 20TB, so outside the area with data and I have moved all of the data off the disk that I want to replace, do I just ignore the parity errors, then follow this guide: https://docs.unraid.net/unraid-os/manual/storage-management#replacing-a-disk-to-increase-capacity or is there something else I can try? tower-diagnostics-20230708-1356.zip
  22. I'm having ongoing issues with my parity checking. About 80% of the checks return 2 errors on the same blocks. There doesn't seem to be an issue with the array, it's always the same 2 blocks, right at the end of the 20TB check. I have had an issue for the last couple of months where the parity check history is not loading, although it is available on the flash, as detailed here: The last time I ran a successful correcting parity check I used the command line. However, this time (a few weeks later) it's refusing to start and I'm not sure why. The array's started and everything else is working as it should as far as I can tell. Here's what I get: root@Tower:~# parity.check correct Not allowed as No array operation in progress already running I can't find any other instances of that error message online. I'm using the 2023.04.30 version of the plugin on v6.11.1 of Unraid.
  23. I've seen the official page here: https://wiki.unraid.net/Shrink_array which looks to have been written in 2016. And read this forum post from 2018, in which @JorgeBstates "there might be an issue with it and latest unRAID releases, script author has been MIA for a while." Considering that was 5 years ago, what is the current recommended method, and is there a way that won't take forever (I have 2 x 20TB parity drives) and allow me to keep my array running during the process?
  24. +1 for this- in order for Duplicacy to work efficiently, it needs to be able to access the individual files so it can avoid duplication of any files that it has already already backed up. With both of the current settings (.tar.gz/.tar) it is not able to do this as it does not have access to the individual files. Could we please have a 3rd option to not have the files placed into an archive/ archives and just have them copied into an individual folder for each docker? Many thanks
  25. There's a typo on the Thunderbird Docker template: It's en_GB, not en_UK.