-
Posts
95 -
Joined
-
Last visited
Content Type
Profiles
Forums
Downloads
Store
Gallery
Bug Reports
Documentation
Landing
Posts posted by -C-
-
-
2 hours ago, JorgeB said:
I took that as a standard "formatting will remove all data on this disk" type warning. I didn't think any of that was relevant to what I was doing- I had an empty disk with no data on it to worry about. To me "...will normally lead to loss of all data on the disk being formatted" is about losing any data on the disk being added, it doesn't mention anything about a different format breaking the rebuild process.
-
Thanks Jorge,
I followed the official guide here: https://docs.unraid.net/unraid-os/manual/storage-management/#replacing-faileddisabled-disks
I had missed the point in the notes there "The rebuild process can never be used to change the format of a disk - it can only rebuild to the existing format."
I wasn't using the process to change the format, it was to replace a failed drive, so I skipped over this point. Then when it came to adding the new disk, I figured that as I was having to format the disk anyway, i might as well switch to ZFS so I can take advantage of some of its benefits over XFS like replication from cache etc.
Now that I think about how parity works, I realise that there was no way it could have worked with a different format type. It was my mistake. This was the first time I rebuilt a disk from parity, so it was all new to me. Fortunately, none of the data on that disk was irreplaceable and I have a backup of most of it.
I do think that there should be a bigger warning about this in the manual to make it clearer that a change of format will stop the rebuild from being able to work. Especially now that I'd imagine there are others like me who'd like to take advantage of ZFS since 6.12 and may be tempted to do what I did.
Even better would be if there's a way for the system to check whether you're trying to rebuild onto a disk with a different format than the one being replaced when attempting to start a rebuild and give a nice clear data loss warning. -
-
I had some troubles during the rebuild and had to restart the server, but the rebuild appeared to complete successfully:
Any idea what could have caused this?
This is the first time a disk died without warning and I've needed to use parity.I'm assuming the data from the failed drive's gone. Fortunately, I have most of it backed up.
I am using syslog to save the logs, so should have logs of everything if they're of use.
-
On 9/5/2023 at 8:25 PM, ljm42 said:
Yeah, adding it to Tools > Update OS is just the first step. We'll deal with the banner in the future.
What about linking the banner to Tools > Update OS, at least as an interim solution?
-
Just now, itimpi said:
I wonder if you got an unclean shutdown (or the plugin erroneously thought one had happened) as that would stop anything being restarted.
Certainly possible. The system wasn't happy when I rebooted it (which was the reason for the reboot) and it may have killed hung processes in order to reboot. It certainly took longer than usual.
(I used powerdown -r to restart in case that makes any difference.)
-
2 hours ago, JorgeB said:
Probably only a reboot will help.
In which case I'm stuck in a loop, for now- I rebooted the first time it happened and everything was stable for a day or so before it happened again, without a reason I can find.
What's painful is that the rebuild is happening slowly- when I could last access the GUI I was getting around 10-30 MB/s, so I'll likely be stuck without a GUI for another day at least. I've not had a disk fail without warning before, so not had to rebuild from parity like this and am not sure whether that's normal. It's certainly running a lot slower than a correcting check.
38 minutes ago, itimpi said:
Are you sure? I am sure it used to. I will have to check this out again.That's what happened when I rebooted part way through the rebuild yesterday. Not sure if that's normal though.
I've disabled mover as the rebuild was stopping for the daily move and not restarting afterwards.
-
Just tried logging into the Unraid GUI and am now getting a
Load is pegged again:
top - 12:24:40 up 1 day, 12:32, 1 user, load average: 52.86, 52.47, 52.31 Tasks: 1152 total, 1 running, 1151 sleeping, 0 stopped, 0 zombie %Cpu(s): 2.9 us, 5.2 sy, 0.0 ni, 82.7 id, 9.1 wa, 0.0 hi, 0.1 si, 0.0 st MiB Mem : 31872.3 total, 6157.1 free, 12252.2 used, 13463.0 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 18476.1 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11805 root 20 0 975184 412108 516 S 93.7 1.3 158:06.42 /usr/local/bin/shfs /mnt/user -disks 31 -o default_permissions,allow_other,noatime -o remember=0 29298 nobody 20 0 226844 109888 32124 S 22.5 0.3 12:33.84 /usr/lib/plexmediaserver/Plex Media Server 12138 nobody 20 0 386324 70936 55404 S 6.0 0.2 0:00.28 php-fpm: pool www 9015 root 20 0 0 0 0 S 4.3 0.0 137:11.80 [unraidd0] 13271 nobody 20 0 386216 65896 50496 S 4.3 0.2 0:00.16 php-fpm: pool www 22798 nobody 20 0 386744 79348 63384 S 4.3 0.2 0:17.22 php-fpm: pool www 7495 root 20 0 0 0 0 D 1.0 0.0 26:23.74 [mdrecoveryd]
Here's a list of installed plugins:
root@Tower:~# ls /var/log/plugins/ Python3.plg@ dynamix.cache.dirs.plg@ dynamix.system.temp.plg@ open.files.plg@ unRAIDServer.plg@ zfs.master.plg@ appdata.backup.plg@ dynamix.file.integrity.plg@ dynamix.unraid.net.plg@ parity.check.tuning.plg@ unassigned.devices-plus.plg@ community.applications.plg@ dynamix.file.manager.plg@ file.activity.plg@ qnap-ec.plg@ unassigned.devices.plg@ disklocation-master.plg@ dynamix.s3.sleep.plg@ fix.common.problems.plg@ tips.and.tweaks.plg@ unbalance.plg@ dynamix.active.streams.plg@ dynamix.system.autofan.plg@ intel-gpu-top.plg@ unRAID6-Sanoid.plg@ user.scripts.plg@
I can access files on the array OK over the network, rebuild is still running, albeit very slowly:
root@Tower:~# parity.check status Status: Parity Sync/Data Rebuild (65.2% completed)
Any advice on what I can try to get the load back down?
-
Load is still climbing:
load average: 57.57, 57.49, 57.00
Looks like it could be related Docker:
root@Tower:/mnt/user/system# umount /var/lib/docker
umount: /var/lib/docker: target is busy.Parity rebuild seems to be going much slower than it should be, guess it's due to the high load.
So I ran
parity.check stop
after doing so the GUI's now loading fine, but I'm getting a "Retry unmounting user share(s)" in the GUI footer.
I tried a reboot but it's hung.
Via SSH I tried stopping Docker service, but it doesn't seem to be that:
root@Tower:/mnt/disks# umount /var/lib/docker umount: /var/lib/docker: not mounted.
I left it (wasn't sure what else to try) and eventually it restarted and things seem to be back to normal.
I've now discovered that the Parity Tuning plugin doesn't/ can't continue a Parity Sync/Data Rebuild in the same way that it can a correcting parity check, so it's back to the beginning with that.
I'm going to avoid touching anything until the rebuild's finished.
-
I checked the logs and found the crash happened around here: Sep 3 17:00:54 Tower webGUI: Successful login user root from 192.168.34.42 Sep 3 17:01:25 Tower php-fpm[7836]: [WARNING] [pool www] server reached max_children setting (50), consider raising it
Doing some further digging on that error, I found this post:
In which the poster found the issue was due to the GPU Statistics plugin.
I had just installed that a couple of days ago, so it would seem that this is likely the cause of my problem too.
I successfully removed the plugin via CLI with
plugin remove gpustat.plg
...but after a few minutes the sys load remains high and still no GUI.
Looking like a reboot's my only option, but
Status: Parity Sync/Data Rebuild (65.3% completed)
-
I was updating some docker containers through the Docker GUI page when the page froze.
Checked top via SSH and got this: top - 17:58:46 up 1 day, 21:02, 1 user, load average: 53.92, 53.54, 53.07 Tasks: 1107 total, 3 running, 1104 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.9 us, 2.6 sy, 0.0 ni, 54.9 id, 41.4 wa, 0.0 hi, 0.2 si, 0.0 st MiB Mem : 31872.3 total, 5800.0 free, 11092.2 used, 14980.1 buff/cache MiB Swap: 0.0 total, 0.0 free, 0.0 used. 19566.8 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 24398 root 20 0 34316 32632 1960 R 21.5 0.1 0:05.21 find 12074 root 20 0 974656 409920 532 S 10.9 1.3 215:55.56 shfs 18749 nobody 20 0 455592 102776 80756 S 5.3 0.3 0:02.04 php-fpm82 21905 nobody 20 0 386252 77244 61756 R 4.6 0.2 0:00.31 php-fpm82 18604 nobody 20 0 455788 105656 83604 S 4.0 0.3 0:02.86 php-fpm82 11495 root 0 -20 0 0 0 S 3.3 0.0 41:37.86 z_rd_int_0 11496 root 0 -20 0 0 0 S 3.3 0.0 41:39.55 z_rd_int_1 11497 root 0 -20 0 0 0 S 3.3 0.0 41:38.24 z_rd_int_2 7491 nobody 20 0 2711124 259320 24452 S 1.7 0.8 0:11.99 mariadbd
Thing is, I'm part way through an array rebuild having replaced a failed HDD. Usually I would restart the server if the GUI becomes snafu, but in this case, is it safe to do so? (I have the Parity Check Tuning plugin installed) or is there a CLI command I can try to bring things back?
-
Thanks for the super useful plugin. There's a typo on its settings page:
Datasets
Datasets Exclussion Patterns (Just One!):
should be
Datasets
Datasets Exclusion Patterns (Just One!):
- 1
- 1
-
I ran a large unBALANCE transfer and, typically, it all went without a hitch.
Not sure why it took so long, but it completed successfully and when finished the Unraid GUI was still running fine. Will continue to monitor the situation and will post if things go awry again.
-
9 hours ago, JorgeB said:
You can enable the syslog server and post that after a crash, to see if there's something relevant logged, don't see anything in the diags posted.
Thanks for checking the diags- I have some more moving to do and will be sure to post the logs.
7 hours ago, sphbecker said:I also use rsync and had an issue with my GUI crashing, but I never made the connection. For me, the issue seemed to stop when I upgraded to 6.12.3, but that could also be coincidence. I don't put a ton of data on my server to need to sync.
Interesting. I've not been able to find another example of someone having the same issue, so somewhat reassuring to know I'm (possibly) not the only one.
Then again, this was happening when I first started using Unraid last year (6.11.3). I didn't upgrade until the 6.12 releases came out and have kept up to date since then. Unfortunately the problem has persisted across all of the versions I've been on.
-
I'm moving files around so I can start making use of ZFS. This issue has been going on for most of this year, but I've rarely needed to move large amounts of data around so haven't spent much time on troubleshooting.
I started doing large moves using rsync via CLI and had problems, so moved to unBALANCE as it gives better visibility of what's going on. The crashing appears to be the same when I use either method.
I find that the move completes correctly, but the main Unraid GUI becomes unreachable after a while. So for example, the whole move may take 5 hours, but the GUI becomes unreachable after 2 hours. If I try to connect with the "Array - Unraid Monitor" Android app while the GUI's borked, it displays a "Sever error" message. When the Unraid GUI is down, the unBALANCE GUI is not affected and still runs fine.
I created the attached diagnostics while unBALANCE was running. It was a 172GB move, took nearly 50 mins and completed successfully.
This is what top looked like while the GUI was crashed after an earlier, larger move. This was many hours after the move had finished. No dockers or VMs running:
-
Thanks to whoever's responsible for the Nicotine+ container. I tried a few times over the years to get it running on Windows and never had any joy. It fired straight up on my Unraid 6.12.2 and all looks good.
Couple of things I noticed:-
Main thing is with the download folders specified in the template. I have them both set to reference subfolders within my /mnt/user/Audio/ share, but the files weren't being saved to those dirs. I eventually found that they were being saved, but in the container's config/.local/share/nicotine/downloads directory. I fixed this by going into the Nicotine+ prefs/ downloads and changing the Incomplete and completed to the container's dirs as specified by default in the template.
I also had to go to prefs/ Shares and manually add the container's shares folder that was specified by default.On the template- both descriptions for the complete and incomplete downloads are the same.
In Prefs, in User Interface the "Prefer dark mode" selection is being ignored.
-
Plugin page description typo:
"Currently know supported units are:"
Should be
"Currently known supported units are:"
- 1
-
Couldn't find a more suitable topic- hope the relevant person sees this.
On this page: https://unraid.net/services
There is no British Standard Time timezone. It's either GMT/ Greenwich Mean Time (UTC), or BST/ British Summer Time (UTC+1) depending on whether we're in daylight savings or not.
-
4 hours ago, itimpi said:
if you have mover or appdata backup running then when either of those is detected you are likely to get the plugin executing the pause but then not doing the resume so you need a manual resume to continue.I have both of those running daily and although the PCT log entries stopped just after the mover started, the actual rebuild continued and completed seemingly successfully without any interaction on my part.
-
5 minutes ago, trurl said:
Don't just make the warning go away. An occasional CRC is OK to acknowledge. Any other type of warning might be a serious disk problem.
Understood, thanks.
(PS- I meant green thumbs, not dots)
-
On 10/8/2020 at 2:56 AM, trurl said:
On the Dashboard page, click on the SMART warning for the disk and you can acknowledge it. It will warn again if the count increases.
Thanks for this- there's no way I would've guessed to click on that warning. Good to see all green dots again.
-
Thanks Dave- that makes things clearer. If only the standard messages were as descriptive as the Parity Check Tuning ones.
I check in on my server most days and try to stay on top of app & plugin updates as soon as they become available. The Parity Check Tuning plugin is indeed on 2023.07.08 and I believe it was updated before I replaced the disk, but not certain.
Good luck with finding the cause of the stopping monitoring task. In my case all seemed good until the daily mover operation started.
-
On 7/10/2023 at 8:20 AM, JorgeB said:
most users won't know about that
I'm still not 100% sure about what's going on with all this 😜
Here's an update with what happened. I followed the guide to replace the failing disk. The rebuild onto the new disk appears to have gone well with no errors reported:and
What's strange is that there's nothing in the logs at the 10:00 timestamp that the parity result shows as the rebuild end time:
Jul 10 06:45:11 Tower emhttpd: spinning down /dev/sde Jul 10 09:15:08 Tower autofan: Highest disk temp is 43C, adjusting fan speed from: 230 (90% @ 833rpm) to: 205 (80% @ 854rpm) Jul 10 09:20:14 Tower autofan: Highest disk temp is 44C, adjusting fan speed from: 205 (80% @ 868rpm) to: 230 (90% @ 834rpm) Jul 10 09:39:17 Tower emhttpd: read SMART /dev/sdh Jul 10 09:59:53 Tower webGUI: Successful login user root from 192.168.34.42 Jul 10 10:00:43 Tower kernel: md: sync done. time=132325sec Jul 10 10:00:43 Tower kernel: md: recovery thread: exit status: 0 Jul 10 10:05:23 Tower autofan: Highest disk temp is 43C, adjusting fan speed from: 230 (90% @ 869rpm) to: 205 (80% @ 907rpm) Jul 10 10:09:42 Tower emhttpd: spinning down /dev/sdh Jul 10 10:14:57 Tower webGUI: Successful login user root from 192.168.34.42 Jul 10 10:15:29 Tower autofan: Highest disk temp is 42C, adjusting fan speed from: 205 (80% @ 869rpm) to: 180 (70% @ 854rpm) Jul 10 10:30:00 Tower webGUI: Successful login user root from 192.168.34.42 Jul 10 10:30:34 Tower autofan: Highest disk temp is 41C, adjusting fan speed from: 180 (70% @ 850rpm) to: 155 (60% @ 853rpm) Jul 10 10:30:44 Tower emhttpd: spinning down /dev/sdg
I can see this in the log when the rebuild starts:
Jul 8 21:17:28 Tower Parity Check Tuning: DEBUG: Parity Sync/Data Rebuild running Jul 8 21:17:28 Tower Parity Check Tuning: Parity Sync/Data Rebuild detected Jul 8 21:17:28 Tower Parity Check Tuning: DEBUG: Created cron entry for 6 minute interval monitoring
Then I get the update every 6 minutes as expected:Jul 9 02:24:34 Tower Parity Check Tuning: DEBUG: Parity Sync/Data Rebuild running Jul 9 02:30:20 Tower Parity Check Tuning: DEBUG: Parity Sync/Data Rebuild running Jul 9 02:36:33 Tower Parity Check Tuning: DEBUG: Parity Sync/Data Rebuild running
Until here:Jul 9 02:42:20 Tower Parity Check Tuning: DEBUG: Parity Sync/Data Rebuild running Jul 9 02:42:20 Tower Parity Check Tuning: DEBUG: detected that mdcmd had been called from sh with command mdcmd nocheck PAUSE
Which happens a couple of minutes after this:
Jul 9 02:40:01 Tower root: mover: started
There are no further parity related entries after that.
I'm not sure whether I can consider things OK now, or whether I should be investigating further.
-
My issue with the 2 errors being found during parity check remains.
I've now got a failing drive and have a new one to replace it with. I've successfully moved everything off the old drive.
I had an unclean shutdown recently and Unraid came back up it ran an automatic correcting check which finished today and this is the result from the log:
Jul 8 03:18:43 Tower Parity Check Tuning: DEBUG: Automatic Correcting Parity-Check running Jul 8 03:19:25 Tower kernel: md: recovery thread: P corrected, sector=39063584664 Jul 8 03:19:25 Tower kernel: md: recovery thread: P corrected, sector=39063584696 Jul 8 03:19:25 Tower kernel: md: sync done. time=1844sec Jul 8 03:19:25 Tower kernel: md: recovery thread: exit status: 0
The problem is with the same 2 sectors on parity P that have been coming up as bad since the middle of December, but not always:
Both parity drives completed their SMART short self-tests without error.
I'm unsure how best to proceed. As my largest data disk is 18TB, the parities are 20TB and these 2 problem sectors are right at the end of the 20TB, so outside the area with data and I have moved all of the data off the disk that I want to replace, do I just ignore the parity errors, then follow this guide: https://docs.unraid.net/unraid-os/manual/storage-management#replacing-a-disk-to-increase-capacity or is there something else I can try?
Array disk died & replaced, new disk empty after rebuild
in General Support
Posted
Makes me feel slightly less dumb if someone else has done the same thing!
Good luck, I hope you can put that disk in, but I'll leave that to someone more knowledgeable.