-C-

September 18, 2023

1 hour ago, da02uk said:

I was replacing an older disc and also wanted to update the file system, so have done the same.

Makes me feel slightly less dumb if someone else has done the same thing!

Good luck, I hope you can put that disk in, but I'll leave that to someone more knowledgeable.

September 18, 2023

2 hours ago, JorgeB said:

There already is a warning not to format disks during a rebuild, and the txt changed a couple of times to make as clear as possible, but it seems it doesn't help in some cases.

I took that as a standard "formatting will remove all data on this disk" type warning. I didn't think any of that was relevant to what I was doing- I had an empty disk with no data on it to worry about. To me "...will normally lead to loss of all data on the disk being formatted" is about losing any data on the disk being added, it doesn't mention anything about a different format breaking the rebuild process.

September 18, 2023

Thanks Jorge,

I followed the official guide here: https://docs.unraid.net/unraid-os/manual/storage-management/#replacing-faileddisabled-disks

I had missed the point in the notes there "The rebuild process can never be used to change the format of a disk - it can only rebuild to the existing format."

I wasn't using the process to change the format, it was to replace a failed drive, so I skipped over this point. Then when it came to adding the new disk, I figured that as I was having to format the disk anyway, i might as well switch to ZFS so I can take advantage of some of its benefits over XFS like replication from cache etc.

Now that I think about how parity works, I realise that there was no way it could have worked with a different format type. It was my mistake. This was the first time I rebuilt a disk from parity, so it was all new to me. Fortunately, none of the data on that disk was irreplaceable and I have a backup of most of it.

I do think that there should be a bigger warning about this in the manual to make it clearer that a change of format will stop the rebuild from being able to work. Especially now that I'd imagine there are others like me who'd like to take advantage of ZFS since 6.12 and may be tempted to do what I did.
Even better would be if there's a way for the system to check whether you're trying to rebuild onto a disk with a different format than the one being replaced when attempting to start a rebuild and give a nice clear data loss warning.

September 18, 2023

Thanks Jorge

Here they are:

syslog-192.168.34.43.log.2.7z

The disk died around 7:20 on 1/9

tower-diagnostics-20230918-1104.zip

September 16, 2023

I had some troubles during the rebuild and had to restart the server, but the rebuild appeared to complete successfully:

Any idea what could have caused this?
This is the first time a disk died without warning and I've needed to use parity.

I'm assuming the data from the failed drive's gone. Fortunately, I have most of it backed up.

I am using syslog to save the logs, so should have logs of everything if they're of use.

September 9, 2023

On 9/5/2023 at 8:25 PM, ljm42 said:

Yeah, adding it to Tools > Update OS is just the first step. We'll deal with the banner in the future.

What about linking the banner to Tools > Update OS, at least as an interim solution?

September 5, 2023

Just now, itimpi said:

I wonder if you got an unclean shutdown (or the plugin erroneously thought one had happened) as that would stop anything being restarted.

Certainly possible. The system wasn't happy when I rebooted it (which was the reason for the reboot) and it may have killed hung processes in order to reboot. It certainly took longer than usual.

(I used powerdown -r to restart in case that makes any difference.)

September 5, 2023

2 hours ago, JorgeB said:

Probably only a reboot will help.

In which case I'm stuck in a loop, for now- I rebooted the first time it happened and everything was stable for a day or so before it happened again, without a reason I can find.

What's painful is that the rebuild is happening slowly- when I could last access the GUI I was getting around 10-30 MB/s, so I'll likely be stuck without a GUI for another day at least. I've not had a disk fail without warning before, so not had to rebuild from parity like this and am not sure whether that's normal. It's certainly running a lot slower than a correcting check.

38 minutes ago, itimpi said:

Are you sure? I am sure it used to. I will have to check this out again.

That's what happened when I rebooted part way through the rebuild yesterday. Not sure if that's normal though.

I've disabled mover as the rebuild was stopping for the daily move and not restarting afterwards.

September 5, 2023

Just tried logging into the Unraid GUI and am now getting a

image.png.da5303676b9e63f28eb35a565bb160ab.png

Load is pegged again:

top - 12:24:40 up 1 day, 12:32,  1 user,  load average: 52.86, 52.47, 52.31
Tasks: 1152 total,   1 running, 1151 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.9 us,  5.2 sy,  0.0 ni, 82.7 id,  9.1 wa,  0.0 hi,  0.1 si,  0.0 st
MiB Mem :  31872.3 total,   6157.1 free,  12252.2 used,  13463.0 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  18476.1 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
11805 root      20   0  975184 412108    516 S  93.7   1.3 158:06.42 /usr/local/bin/shfs /mnt/user -disks 31 -o default_permissions,allow_other,noatime -o remember=0
29298 nobody    20   0  226844 109888  32124 S  22.5   0.3  12:33.84 /usr/lib/plexmediaserver/Plex Media Server
12138 nobody    20   0  386324  70936  55404 S   6.0   0.2   0:00.28 php-fpm: pool www
 9015 root      20   0       0      0      0 S   4.3   0.0 137:11.80 [unraidd0]
13271 nobody    20   0  386216  65896  50496 S   4.3   0.2   0:00.16 php-fpm: pool www
22798 nobody    20   0  386744  79348  63384 S   4.3   0.2   0:17.22 php-fpm: pool www
 7495 root      20   0       0      0      0 D   1.0   0.0  26:23.74 [mdrecoveryd]

Here's a list of installed plugins:

root@Tower:~# ls /var/log/plugins/
Python3.plg@                 dynamix.cache.dirs.plg@      dynamix.system.temp.plg@  open.files.plg@           unRAIDServer.plg@             zfs.master.plg@
appdata.backup.plg@          dynamix.file.integrity.plg@  dynamix.unraid.net.plg@   parity.check.tuning.plg@  unassigned.devices-plus.plg@
community.applications.plg@  dynamix.file.manager.plg@    file.activity.plg@        qnap-ec.plg@              unassigned.devices.plg@
disklocation-master.plg@     dynamix.s3.sleep.plg@        fix.common.problems.plg@  tips.and.tweaks.plg@      unbalance.plg@
dynamix.active.streams.plg@  dynamix.system.autofan.plg@  intel-gpu-top.plg@        unRAID6-Sanoid.plg@       user.scripts.plg@

I can access files on the array OK over the network, rebuild is still running, albeit very slowly:

root@Tower:~# parity.check status
Status:  Parity Sync/Data Rebuild  (65.2% completed)

Any advice on what I can try to get the load back down?

September 3, 2023

Load is still climbing:

load average: 57.57, 57.49, 57.00

Looks like it could be related Docker:

root@Tower:/mnt/user/system# umount /var/lib/docker
umount: /var/lib/docker: target is busy.

Parity rebuild seems to be going much slower than it should be, guess it's due to the high load.

So I ran

parity.check stop

after doing so the GUI's now loading fine, but I'm getting a "Retry unmounting user share(s)" in the GUI footer.

I tried a reboot but it's hung.

Via SSH I tried stopping Docker service, but it doesn't seem to be that:

root@Tower:/mnt/disks# umount /var/lib/docker
umount: /var/lib/docker: not mounted.

I left it (wasn't sure what else to try) and eventually it restarted and things seem to be back to normal.

I've now discovered that the Parity Tuning plugin doesn't/ can't continue a Parity Sync/Data Rebuild in the same way that it can a correcting parity check, so it's back to the beginning with that.

I'm going to avoid touching anything until the rebuild's finished.

September 3, 2023

I checked the logs and found the crash happened around here:

Sep  3 17:00:54 Tower webGUI: Successful login user root from 192.168.34.42
Sep  3 17:01:25 Tower php-fpm[7836]: [WARNING] [pool www] server reached max_children setting (50), consider raising it

Doing some further digging on that error, I found this post:

In which the poster found the issue was due to the GPU Statistics plugin.

I had just installed that a couple of days ago, so it would seem that this is likely the cause of my problem too.

I successfully removed the plugin via CLI with

plugin remove gpustat.plg

...but after a few minutes the sys load remains high and still no GUI.

Looking like a reboot's my only option, but

Status:  Parity Sync/Data Rebuild  (65.3% completed)

September 3, 2023

I was updating some docker containers through the Docker GUI page when the page froze.

Checked top via SSH and got this:

top - 17:58:46 up 1 day, 21:02,  1 user,  load average: 53.92, 53.54, 53.07
Tasks: 1107 total,   3 running, 1104 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.9 us,  2.6 sy,  0.0 ni, 54.9 id, 41.4 wa,  0.0 hi,  0.2 si,  0.0 st
MiB Mem :  31872.3 total,   5800.0 free,  11092.2 used,  14980.1 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  19566.8 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
24398 root      20   0   34316  32632   1960 R  21.5   0.1   0:05.21 find
12074 root      20   0  974656 409920    532 S  10.9   1.3 215:55.56 shfs
18749 nobody    20   0  455592 102776  80756 S   5.3   0.3   0:02.04 php-fpm82
21905 nobody    20   0  386252  77244  61756 R   4.6   0.2   0:00.31 php-fpm82
18604 nobody    20   0  455788 105656  83604 S   4.0   0.3   0:02.86 php-fpm82
11495 root       0 -20       0      0      0 S   3.3   0.0  41:37.86 z_rd_int_0
11496 root       0 -20       0      0      0 S   3.3   0.0  41:39.55 z_rd_int_1
11497 root       0 -20       0      0      0 S   3.3   0.0  41:38.24 z_rd_int_2
 7491 nobody    20   0 2711124 259320  24452 S   1.7   0.8   0:11.99 mariadbd

Thing is, I'm part way through an array rebuild having replaced a failed HDD. Usually I would restart the server if the GUI becomes snafu, but in this case, is it safe to do so? (I have the Parity Check Tuning plugin installed) or is there a CLI command I can try to bring things back?

August 31, 2023

Thanks for the super useful plugin. There's a typo on its settings page:

Datasets

Datasets Exclussion Patterns (Just One!):

should be

Datasets

Datasets Exclusion Patterns (Just One!):

August 31, 2023

I ran a large unBALANCE transfer and, typically, it all went without a hitch.

image.png.a9ff31ec1d62eb09c9c0380ea501a508.png

Not sure why it took so long, but it completed successfully and when finished the Unraid GUI was still running fine. Will continue to monitor the situation and will post if things go awry again.

August 20, 2023

9 hours ago, JorgeB said:

You can enable the syslog server and post that after a crash, to see if there's something relevant logged, don't see anything in the diags posted.

Thanks for checking the diags- I have some more moving to do and will be sure to post the logs.

7 hours ago, sphbecker said:

I also use rsync and had an issue with my GUI crashing, but I never made the connection. For me, the issue seemed to stop when I upgraded to 6.12.3, but that could also be coincidence. I don't put a ton of data on my server to need to sync.

Interesting. I've not been able to find another example of someone having the same issue, so somewhat reassuring to know I'm (possibly) not the only one.

Then again, this was happening when I first started using Unraid last year (6.11.3). I didn't upgrade until the 6.12 releases came out and have kept up to date since then. Unfortunately the problem has persisted across all of the versions I've been on.

August 20, 2023

I'm moving files around so I can start making use of ZFS. This issue has been going on for most of this year, but I've rarely needed to move large amounts of data around so haven't spent much time on troubleshooting.

I started doing large moves using rsync via CLI and had problems, so moved to unBALANCE as it gives better visibility of what's going on. The crashing appears to be the same when I use either method.

I find that the move completes correctly, but the main Unraid GUI becomes unreachable after a while. So for example, the whole move may take 5 hours, but the GUI becomes unreachable after 2 hours. If I try to connect with the "Array - Unraid Monitor" Android app while the GUI's borked, it displays a "Sever error" message. When the Unraid GUI is down, the unBALANCE GUI is not affected and still runs fine.

I created the attached diagnostics while unBALANCE was running. It was a 172GB move, took nearly 50 mins and completed successfully.

This is what top looked like while the GUI was crashed after an earlier, larger move. This was many hours after the move had finished. No dockers or VMs running:

tower-diagnostics-20230820-1400.zip

July 27, 2023

Thanks to whoever's responsible for the Nicotine+ container. I tried a few times over the years to get it running on Windows and never had any joy. It fired straight up on my Unraid 6.12.2 and all looks good.

Couple of things I noticed:-

Main thing is with the download folders specified in the template. I have them both set to reference subfolders within my /mnt/user/Audio/ share, but the files weren't being saved to those dirs. I eventually found that they were being saved, but in the container's config/.local/share/nicotine/downloads directory. I fixed this by going into the Nicotine+ prefs/ downloads and changing the Incomplete and completed to the container's dirs as specified by default in the template.
I also had to go to prefs/ Shares and manually add the container's shares folder that was specified by default.

On the template- both descriptions for the complete and incomplete downloads are the same.

In Prefs, in User Interface the "Prefer dark mode" selection is being ignored.

July 24, 2023

Plugin page description typo:

"Currently know supported units are:"

Should be

"Currently known supported units are:"

July 13, 2023

Couldn't find a more suitable topic- hope the relevant person sees this.

On this page: https://unraid.net/services

There is no British Standard Time timezone. It's either GMT/ Greenwich Mean Time (UTC), or BST/ British Summer Time (UTC+1) depending on whether we're in daylight savings or not.

image.png.a30dc36b3d9981970b9d9f47c8d42da7.png

July 13, 2023

4 hours ago, itimpi said:

if you have mover or appdata backup running then when either of those is detected you are likely to get the plugin executing the pause but then not doing the resume so you need a manual resume to continue.

I have both of those running daily and although the PCT log entries stopped just after the mover started, the actual rebuild continued and completed seemingly successfully without any interaction on my part.

July 12, 2023

5 minutes ago, trurl said:

Don't just make the warning go away. An occasional CRC is OK to acknowledge. Any other type of warning might be a serious disk problem.

Understood, thanks.

(PS- I meant green thumbs, not dots)

July 12, 2023

On 10/8/2020 at 2:56 AM, trurl said:

On the Dashboard page, click on the SMART warning for the disk and you can acknowledge it. It will warn again if the count increases.

Thanks for this- there's no way I would've guessed to click on that warning. Good to see all green dots again.

July 12, 2023

Thanks Dave- that makes things clearer. If only the standard messages were as descriptive as the Parity Check Tuning ones.

I check in on my server most days and try to stay on top of app & plugin updates as soon as they become available. The Parity Check Tuning plugin is indeed on 2023.07.08 and I believe it was updated before I replaced the disk, but not certain.

Good luck with finding the cause of the stopping monitoring task. In my case all seemed good until the daily mover operation started.

July 11, 2023

On 7/10/2023 at 8:20 AM, JorgeB said:

most users won't know about that

I'm still not 100% sure about what's going on with all this 😜

Here's an update with what happened. I followed the guide to replace the failing disk. The rebuild onto the new disk appears to have gone well with no errors reported:

and

image.png.2b9bb36de4e4ab75a6351994e43320d1.png

What's strange is that there's nothing in the logs at the 10:00 timestamp that the parity result shows as the rebuild end time:

Jul 10 06:45:11 Tower emhttpd: spinning down /dev/sde
Jul 10 09:15:08 Tower autofan: Highest disk temp is 43C, adjusting fan speed from: 230 (90% @ 833rpm) to: 205 (80% @ 854rpm)
Jul 10 09:20:14 Tower autofan: Highest disk temp is 44C, adjusting fan speed from: 205 (80% @ 868rpm) to: 230 (90% @ 834rpm)
Jul 10 09:39:17 Tower emhttpd: read SMART /dev/sdh
Jul 10 09:59:53 Tower webGUI: Successful login user root from 192.168.34.42
Jul 10 10:00:43 Tower kernel: md: sync done. time=132325sec
Jul 10 10:00:43 Tower kernel: md: recovery thread: exit status: 0
Jul 10 10:05:23 Tower autofan: Highest disk temp is 43C, adjusting fan speed from: 230 (90% @ 869rpm) to: 205 (80% @ 907rpm)
Jul 10 10:09:42 Tower emhttpd: spinning down /dev/sdh
Jul 10 10:14:57 Tower webGUI: Successful login user root from 192.168.34.42
Jul 10 10:15:29 Tower autofan: Highest disk temp is 42C, adjusting fan speed from: 205 (80% @ 869rpm) to: 180 (70% @ 854rpm)
Jul 10 10:30:00 Tower webGUI: Successful login user root from 192.168.34.42
Jul 10 10:30:34 Tower autofan: Highest disk temp is 41C, adjusting fan speed from: 180 (70% @ 850rpm) to: 155 (60% @ 853rpm)
Jul 10 10:30:44 Tower emhttpd: spinning down /dev/sdg

I can see this in the log when the rebuild starts:

Jul  8 21:17:28 Tower Parity Check Tuning: DEBUG:   Parity Sync/Data Rebuild running
Jul  8 21:17:28 Tower Parity Check Tuning: Parity Sync/Data Rebuild detected
Jul  8 21:17:28 Tower Parity Check Tuning: DEBUG:   Created cron entry for 6 minute interval monitoring

Then I get the update every 6 minutes as expected:

Jul  9 02:24:34 Tower Parity Check Tuning: DEBUG:   Parity Sync/Data Rebuild running
Jul  9 02:30:20 Tower Parity Check Tuning: DEBUG:   Parity Sync/Data Rebuild running
Jul  9 02:36:33 Tower Parity Check Tuning: DEBUG:   Parity Sync/Data Rebuild running

Until here:

Jul  9 02:42:20 Tower Parity Check Tuning: DEBUG:   Parity Sync/Data Rebuild running
Jul  9 02:42:20 Tower Parity Check Tuning: DEBUG:   detected that mdcmd had been called from sh with command mdcmd nocheck PAUSE

Which happens a couple of minutes after this:

Jul  9 02:40:01 Tower root: mover: started

There are no further parity related entries after that.

I'm not sure whether I can consider things OK now, or whether I should be investigating further.

July 8, 2023

My issue with the 2 errors being found during parity check remains.

I've now got a failing drive and have a new one to replace it with. I've successfully moved everything off the old drive.

I had an unclean shutdown recently and Unraid came back up it ran an automatic correcting check which finished today and this is the result from the log:

Jul 8 03:18:43 Tower Parity Check Tuning: DEBUG: Automatic Correcting Parity-Check running 
Jul 8 03:19:25 Tower kernel: md: recovery thread: P corrected, sector=39063584664
Jul 8 03:19:25 Tower kernel: md: recovery thread: P corrected, sector=39063584696 
Jul 8 03:19:25 Tower kernel: md: sync done. time=1844sec 
Jul 8 03:19:25 Tower kernel: md: recovery thread: exit status: 0

The problem is with the same 2 sectors on parity P that have been coming up as bad since the middle of December, but not always:

Both parity drives completed their SMART short self-tests without error.

I'm unsure how best to proceed. As my largest data disk is 18TB, the parities are 20TB and these 2 problem sectors are right at the end of the 20TB, so outside the area with data and I have moved all of the data off the disk that I want to replace, do I just ignore the parity errors, then follow this guide: https://docs.unraid.net/unraid-os/manual/storage-management#replacing-a-disk-to-increase-capacity or is there something else I can try?

tower-diagnostics-20230708-1356.zip

-C-

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by -C-

Array disk died & replaced, new disk empty after rebuild

Array disk died & replaced, new disk empty after rebuild

Array disk died & replaced, new disk empty after rebuild

Array disk died & replaced, new disk empty after rebuild

Array disk died & replaced, new disk empty after rebuild

Unraid OS version 6.12.4 available

Crazy Load Average of 53, GUI unresponsive, am part way through a rebuild. What to do?

Crazy Load Average of 53, GUI unresponsive, am part way through a rebuild. What to do?

Crazy Load Average of 53, GUI unresponsive, am part way through a rebuild. What to do?

Crazy Load Average of 53, GUI unresponsive, am part way through a rebuild. What to do?

Crazy Load Average of 53, GUI unresponsive, am part way through a rebuild. What to do?

Crazy Load Average of 53, GUI unresponsive, am part way through a rebuild. What to do?

[PLUGIN] ZFS Master

Unraid GUI Crashing with larger rsync or unBALALANCE file moves

Unraid GUI Crashing with larger rsync or unBALALANCE file moves

Unraid GUI Crashing with larger rsync or unBALALANCE file moves

[Support] selfhosters.net's Template Repository

[Support] ich777 - AMD Vendor Reset, CoralTPU, hpsahba,...

Wesbite Typo

Moving parity drives from external USB to (shucked) internal SATA

[SOLVED] UDMA CRC error count

[SOLVED] UDMA CRC error count

Moving parity drives from external USB to (shucked) internal SATA

Moving parity drives from external USB to (shucked) internal SATA

Moving parity drives from external USB to (shucked) internal SATA