Moving parity drives from external USB to (shucked) internal SATA

-C- · January 12, 2023

Sync of parity 2 has now completed without error.

I find it worrying that using Firefox with the Unraid GUI has been an issue since at least v6.11.1 and there is no message displayed to Firefox users to warn of potential issues and that it's safest to avoid Firefox. I've spent many hours on this and have avoided using my server since building it nearly a month ago until this was resolved.

I know that Firefox is a marginal browser nowadays, but I'd imagine there's likely a higher percentage of users among the Unraid customer base than globally. There must be a fair few people like me experiencing weird issues and the last thing you'd imagine in 2022/3 is that the browser you're using could cause technical issues with your server's functionality.

-C- · March 11, 2023

Over the last couple of months I've shut the server down twice. Both times a parity check as auto-started on startup with a message about an unclean startup.

Each time I stopped the automatically started sync (as I'd had trouble with that before) and started a correcting check again using Edge. The first time the check completed without error, but I've done the same thing again and I'm back to where I was when I started the check with Firefox:

Mar 10 18:06:11 Tower Parity Check Tuning: DEBUG:   Manual Correcting Parity-Check running
Mar 10 18:09:37 Tower kernel: md: recovery thread: P corrected, sector=39063584664
Mar 10 18:09:37 Tower kernel: md: recovery thread: P corrected, sector=39063584696
Mar 10 18:09:37 Tower kernel: md: sync done. time=148148sec
Mar 10 18:09:37 Tower kernel: md: recovery thread: exit status: 0

This is the same errors on the same sectors as previous.

I'm at a loss as to what to do now. It appears to not be related to the browser I'm using. Is there anything else I can try?

Now a new issue has appeared, when I click the History button under Array Operations I get a blank box overlaid with no means of closing it, I have to go to another page and back to view the Main page again. This happens in Firefox and Edge.

JorgeB · March 11, 2023

Unraid will save diags on the flash drive /logs folder if it cannot do a clean shutdown, post those, they might show why that is happening.

itimpi · March 11, 2023

11 hours ago, -C- said:

Now a new issue has appeared, when I click the History button under Array Operations I get a blank box overlaid with no means of closing it, I have to go to another page and back to view the Main page again. This happens in Firefox and Edge.

I see you have the Parity Check Tuning plugin installed. This plugin replaces the built-in code for displaying history so it could be a problem there that is giving the blank dialog box.

To allow me to check this out to see if I can reproduce your symptoms could you let me have:

version number of Unraid you are using
version number of the plugin you are using
a copy of the config/parity-checks.log file from the flash drive

-C- · March 11, 2023

4 hours ago, JorgeB said:

Unraid will save diags on the flash drive /logs folder if it cannot do a clean shutdown, post those, they might show why that is happening.

Here is the log from the shutdown signal to the last entry before it shut down:

Mar  7 14:43:38 Tower  shutdown[8597]: shutting down for system halt
Mar  7 14:43:38 Tower  init: Switching to runlevel: 0
Mar  7 14:43:38 Tower flash_backup: stop watching for file changes
Mar  7 14:43:38 Tower  init: Trying to re-exec init
Mar  7 14:43:59 Tower Parity Check Tuning: DEBUG:   Array stopping
Mar  7 14:43:59 Tower Parity Check Tuning: DEBUG:   No array operation in progress so no restart information saved
Mar  7 14:43:59 Tower kernel: mdcmd (36): nocheck cancel
Mar  7 14:44:00 Tower  emhttpd: Spinning up all drives...
Mar  7 14:44:00 Tower  emhttpd: spinning up /dev/sdh
Mar  7 14:44:00 Tower  emhttpd: spinning up /dev/sdg
Mar  7 14:44:00 Tower  emhttpd: spinning up /dev/sdd
Mar  7 14:44:00 Tower  emhttpd: spinning up /dev/sde
Mar  7 14:44:00 Tower  emhttpd: spinning up /dev/sdf
Mar  7 14:44:00 Tower  emhttpd: spinning up /dev/sdi
Mar  7 14:44:00 Tower  emhttpd: spinning up /dev/sda
Mar  7 14:44:17 Tower kernel: ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Mar  7 14:44:17 Tower kernel: ata5.00: configured for UDMA/133
Mar  7 14:44:17 Tower  emhttpd: sdspin /dev/sdh up: 1
Mar  7 14:44:17 Tower  emhttpd: read SMART /dev/sdj
Mar  7 14:44:17 Tower  emhttpd: read SMART /dev/sdk
Mar  7 14:44:17 Tower  emhttpd: read SMART /dev/sdh
Mar  7 14:44:17 Tower  emhttpd: read SMART /dev/sdg
Mar  7 14:44:17 Tower  emhttpd: read SMART /dev/sdd
Mar  7 14:44:17 Tower  emhttpd: read SMART /dev/sde
Mar  7 14:44:17 Tower  emhttpd: read SMART /dev/sdb
Mar  7 14:44:17 Tower  emhttpd: read SMART /dev/sdf
Mar  7 14:44:17 Tower  emhttpd: read SMART /dev/nvme0n1
Mar  7 14:44:17 Tower  emhttpd: read SMART /dev/nvme1n1
Mar  7 14:44:17 Tower  emhttpd: read SMART /dev/sdi
Mar  7 14:44:17 Tower  emhttpd: read SMART /dev/sda
Mar  7 14:44:17 Tower  emhttpd: Stopping services...
Mar  7 14:44:38 Tower  emhttpd: shcmd (9923955): /etc/rc.d/rc.docker stop
Mar  7 14:44:39 Tower kernel: docker0: port 9(vethb92db4c) entered disabled state
Mar  7 14:44:39 Tower kernel: vetha796224: renamed from eth0
Mar  7 14:44:39 Tower  avahi-daemon[10171]: Interface vethb92db4c.IPv6 no longer relevant for mDNS.
Mar  7 14:44:39 Tower  avahi-daemon[10171]: Leaving mDNS multicast group on interface vethb92db4c.IPv6 with address fe80::84b5:c3ff:fe35:1c52.
Mar  7 14:44:39 Tower kernel: docker0: port 9(vethb92db4c) entered disabled state
Mar  7 14:44:39 Tower kernel: device vethb92db4c left promiscuous mode
Mar  7 14:44:39 Tower kernel: docker0: port 9(vethb92db4c) entered disabled state
Mar  7 14:44:39 Tower  avahi-daemon[10171]: Withdrawing address record for fe80::84b5:c3ff:fe35:1c52 on vethb92db4c.
Mar  7 14:44:39 Tower kernel: veth520a485: renamed from eth0
Mar  7 14:44:39 Tower kernel: docker0: port 6(vethc2c8bcf) entered disabled state
Mar  7 14:44:39 Tower  avahi-daemon[10171]: Interface vethc2c8bcf.IPv6 no longer relevant for mDNS.
Mar  7 14:44:39 Tower  avahi-daemon[10171]: Leaving mDNS multicast group on interface vethc2c8bcf.IPv6 with address fe80::f0eb:9cff:fe48:b5f0.
Mar  7 14:44:39 Tower kernel: docker0: port 6(vethc2c8bcf) entered disabled state
Mar  7 14:44:39 Tower kernel: device vethc2c8bcf left promiscuous mode
Mar  7 14:44:39 Tower kernel: docker0: port 6(vethc2c8bcf) entered disabled state
Mar  7 14:44:39 Tower  avahi-daemon[10171]: Withdrawing address record for fe80::f0eb:9cff:fe48:b5f0 on vethc2c8bcf.
Mar  7 14:44:39 Tower kernel: veth359095c: renamed from eth0
Mar  7 14:44:39 Tower kernel: docker0: port 1(veth11635d1) entered disabled state
Mar  7 14:44:39 Tower  avahi-daemon[10171]: Interface veth11635d1.IPv6 no longer relevant for mDNS.
Mar  7 14:44:39 Tower  avahi-daemon[10171]: Leaving mDNS multicast group on interface veth11635d1.IPv6 with address fe80::c8d0:34ff:fe40:b86c.
Mar  7 14:44:39 Tower kernel: docker0: port 1(veth11635d1) entered disabled state
Mar  7 14:44:39 Tower kernel: device veth11635d1 left promiscuous mode
Mar  7 14:44:39 Tower kernel: docker0: port 1(veth11635d1) entered disabled state
Mar  7 14:44:39 Tower  avahi-daemon[10171]: Withdrawing address record for fe80::c8d0:34ff:fe40:b86c on veth11635d1.
Mar  7 14:44:43 Tower kernel: docker0: port 8(vethba1f846) entered disabled state
Mar  7 14:44:43 Tower kernel: veth39aff71: renamed from eth0
Mar  7 14:44:43 Tower  avahi-daemon[10171]: Interface vethba1f846.IPv6 no longer relevant for mDNS.
Mar  7 14:44:43 Tower  avahi-daemon[10171]: Leaving mDNS multicast group on interface vethba1f846.IPv6 with address fe80::d40f:c0ff:fe86:60e8.
Mar  7 14:44:43 Tower kernel: docker0: port 8(vethba1f846) entered disabled state
Mar  7 14:44:43 Tower kernel: device vethba1f846 left promiscuous mode
Mar  7 14:44:43 Tower kernel: docker0: port 8(vethba1f846) entered disabled state
Mar  7 14:44:43 Tower  avahi-daemon[10171]: Withdrawing address record for fe80::d40f:c0ff:fe86:60e8 on vethba1f846.
Mar  7 14:44:43 Tower kernel: docker0: port 2(vethb13e418) entered disabled state
Mar  7 14:44:43 Tower kernel: veth8acea87: renamed from eth0
Mar  7 14:44:43 Tower kernel: veth82bed5c: renamed from eth0
Mar  7 14:44:43 Tower kernel: docker0: port 5(veth59668a6) entered disabled state
Mar  7 14:44:43 Tower  avahi-daemon[10171]: Interface vethb13e418.IPv6 no longer relevant for mDNS.
Mar  7 14:44:43 Tower  avahi-daemon[10171]: Leaving mDNS multicast group on interface vethb13e418.IPv6 with address fe80::88b4:78ff:fe8f:4348.
Mar  7 14:44:43 Tower kernel: docker0: port 2(vethb13e418) entered disabled state
Mar  7 14:44:43 Tower kernel: device vethb13e418 left promiscuous mode
Mar  7 14:44:43 Tower kernel: docker0: port 2(vethb13e418) entered disabled state
Mar  7 14:44:43 Tower  avahi-daemon[10171]: Withdrawing address record for fe80::88b4:78ff:fe8f:4348 on vethb13e418.
Mar  7 14:44:43 Tower  avahi-daemon[10171]: Interface veth59668a6.IPv6 no longer relevant for mDNS.
Mar  7 14:44:43 Tower  avahi-daemon[10171]: Leaving mDNS multicast group on interface veth59668a6.IPv6 with address fe80::5c8f:c0ff:fe00:838.
Mar  7 14:44:43 Tower kernel: docker0: port 5(veth59668a6) entered disabled state
Mar  7 14:44:43 Tower kernel: device veth59668a6 left promiscuous mode
Mar  7 14:44:43 Tower kernel: docker0: port 5(veth59668a6) entered disabled state
Mar  7 14:44:43 Tower  avahi-daemon[10171]: Withdrawing address record for fe80::5c8f:c0ff:fe00:838 on veth59668a6.
Mar  7 14:44:43 Tower kernel: docker0: port 7(veth3623bf7) entered disabled state
Mar  7 14:44:43 Tower kernel: vethe14a813: renamed from eth0
Mar  7 14:44:43 Tower  avahi-daemon[10171]: Interface veth3623bf7.IPv6 no longer relevant for mDNS.
Mar  7 14:44:43 Tower  avahi-daemon[10171]: Leaving mDNS multicast group on interface veth3623bf7.IPv6 with address fe80::84f7:d3ff:fe68:350b.
Mar  7 14:44:43 Tower kernel: docker0: port 7(veth3623bf7) entered disabled state
Mar  7 14:44:43 Tower kernel: device veth3623bf7 left promiscuous mode
Mar  7 14:44:43 Tower kernel: docker0: port 7(veth3623bf7) entered disabled state
Mar  7 14:44:43 Tower  avahi-daemon[10171]: Withdrawing address record for fe80::84f7:d3ff:fe68:350b on veth3623bf7.
Mar  7 14:44:43 Tower kernel: docker0: port 3(veth2739f34) entered disabled state
Mar  7 14:44:43 Tower kernel: vethb683262: renamed from eth0
Mar  7 14:44:43 Tower  avahi-daemon[10171]: Interface veth2739f34.IPv6 no longer relevant for mDNS.
Mar  7 14:44:43 Tower  avahi-daemon[10171]: Leaving mDNS multicast group on interface veth2739f34.IPv6 with address fe80::4884:77ff:feb7:a969.
Mar  7 14:44:43 Tower kernel: docker0: port 3(veth2739f34) entered disabled state
Mar  7 14:44:43 Tower kernel: device veth2739f34 left promiscuous mode
Mar  7 14:44:43 Tower kernel: docker0: port 3(veth2739f34) entered disabled state
Mar  7 14:44:43 Tower  avahi-daemon[10171]: Withdrawing address record for fe80::4884:77ff:feb7:a969 on veth2739f34.
Mar  7 14:44:43 Tower kernel: docker0: port 4(veth5bc1dc8) entered disabled state
Mar  7 14:44:43 Tower kernel: vethea5fbb3: renamed from eth0
Mar  7 14:44:43 Tower  avahi-daemon[10171]: Interface veth5bc1dc8.IPv6 no longer relevant for mDNS.
Mar  7 14:44:43 Tower  avahi-daemon[10171]: Leaving mDNS multicast group on interface veth5bc1dc8.IPv6 with address fe80::8ae:8eff:fede:a0fe.
Mar  7 14:44:43 Tower kernel: docker0: port 4(veth5bc1dc8) entered disabled state
Mar  7 14:44:43 Tower kernel: device veth5bc1dc8 left promiscuous mode
Mar  7 14:44:43 Tower kernel: docker0: port 4(veth5bc1dc8) entered disabled state
Mar  7 14:44:43 Tower  avahi-daemon[10171]: Withdrawing address record for fe80::8ae:8eff:fede:a0fe on veth5bc1dc8.
Mar  7 14:44:43 Tower kernel: br-8038ba180b14: port 1(veth7a733d2) entered disabled state
Mar  7 14:44:43 Tower kernel: veth7f5366a: renamed from eth0
Mar  7 14:44:43 Tower  avahi-daemon[10171]: Interface veth7a733d2.IPv6 no longer relevant for mDNS.
Mar  7 14:44:43 Tower  avahi-daemon[10171]: Leaving mDNS multicast group on interface veth7a733d2.IPv6 with address fe80::a89e:7dff:fe9b:6b6.
Mar  7 14:44:43 Tower kernel: br-8038ba180b14: port 1(veth7a733d2) entered disabled state
Mar  7 14:44:43 Tower kernel: device veth7a733d2 left promiscuous mode
Mar  7 14:44:43 Tower kernel: br-8038ba180b14: port 1(veth7a733d2) entered disabled state
Mar  7 14:44:43 Tower  avahi-daemon[10171]: Withdrawing address record for fe80::a89e:7dff:fe9b:6b6 on veth7a733d2.
Mar  7 14:44:48 Tower kernel: docker0: port 10(veth82f62ce) entered disabled state
Mar  7 14:44:48 Tower kernel: veth7af951d: renamed from eth0
Mar  7 14:44:49 Tower  avahi-daemon[10171]: Interface veth82f62ce.IPv6 no longer relevant for mDNS.
Mar  7 14:44:49 Tower  avahi-daemon[10171]: Leaving mDNS multicast group on interface veth82f62ce.IPv6 with address fe80::c86c:beff:fefd:e3c4.
Mar  7 14:44:49 Tower kernel: docker0: port 10(veth82f62ce) entered disabled state
Mar  7 14:44:49 Tower kernel: device veth82f62ce left promiscuous mode
Mar  7 14:44:49 Tower kernel: docker0: port 10(veth82f62ce) entered disabled state
Mar  7 14:44:49 Tower  avahi-daemon[10171]: Withdrawing address record for fe80::c86c:beff:fefd:e3c4 on veth82f62ce.
Mar  7 14:44:49 Tower root: stopping dockerd ...
Mar  7 14:44:50 Tower root: waiting for docker to die ...
Mar  7 14:44:51 Tower  avahi-daemon[10171]: Interface docker0.IPv6 no longer relevant for mDNS.
Mar  7 14:44:51 Tower  avahi-daemon[10171]: Leaving mDNS multicast group on interface docker0.IPv6 with address fe80::42:c2ff:fe45:3fc5.
Mar  7 14:44:51 Tower  avahi-daemon[10171]: Interface docker0.IPv4 no longer relevant for mDNS.
Mar  7 14:44:51 Tower  avahi-daemon[10171]: Leaving mDNS multicast group on interface docker0.IPv4 with address 172.17.0.1.
Mar  7 14:44:51 Tower  avahi-daemon[10171]: Withdrawing address record for fe80::42:c2ff:fe45:3fc5 on docker0.
Mar  7 14:44:51 Tower  avahi-daemon[10171]: Withdrawing address record for 172.17.0.1 on docker0.
Mar  7 14:44:51 Tower  emhttpd: shcmd (9923956): umount /var/lib/docker
Mar  7 14:44:52 Tower cache_dirs: Stopping cache_dirs process 4448
Mar  7 14:44:53 Tower cache_dirs: cache_dirs service rc.cachedirs: Stopped
Mar  7 14:45:04 Tower unassigned.devices: Unmounting All Devices...
Mar  7 14:45:04 Tower unassigned.devices: Unmounting partition 'sda2' at mountpoint '/mnt/disks/WD_Green_4TB_714'...
Mar  7 14:45:04 Tower unassigned.devices: Unmount cmd: /sbin/umount -fl '/dev/sda2' 2>&1
Mar  7 14:45:04 Tower  ntfs-3g[15177]: Unmounting /dev/sda2 (WD Green 4TB 714)
Mar  7 14:45:04 Tower unassigned.devices: Successfully unmounted 'sda2'
Mar  7 14:45:04 Tower  sudo: pam_unix(sudo:session): session closed for user root
Mar  7 14:45:05 Tower  emhttpd: shcmd (9923957): /etc/rc.d/rc.samba stop
Mar  7 14:45:05 Tower  wsdd2[9999]: 'Terminated' signal received.
Mar  7 14:45:05 Tower  winbindd[10075]: [2023/03/07 14:45:05.569343,  0] ../../source3/winbindd/winbindd_dual.c:1957(winbindd_sig_term_handler)
Mar  7 14:45:05 Tower  winbindd[10075]:   Got sig[15] terminate (is_parent=1)
Mar  7 14:45:05 Tower  winbindd[10077]: [2023/03/07 14:45:05.569373,  0] ../../source3/winbindd/winbindd_dual.c:1957(winbindd_sig_term_handler)
Mar  7 14:45:05 Tower  winbindd[10077]:   Got sig[15] terminate (is_parent=0)
Mar  7 14:45:05 Tower  winbindd[11433]: [2023/03/07 14:45:05.569416,  0] ../../source3/winbindd/winbindd_dual.c:1957(winbindd_sig_term_handler)
Mar  7 14:45:05 Tower  winbindd[11433]:   Got sig[15] terminate (is_parent=0)
Mar  7 14:45:05 Tower  wsdd2[9999]: terminating.
Mar  7 14:45:05 Tower  emhttpd: shcmd (9923958): rm -f /etc/avahi/services/smb.service
Mar  7 14:45:05 Tower  avahi-daemon[10171]: Files changed, reloading.
Mar  7 14:45:05 Tower  avahi-daemon[10171]: Service group file /services/smb.service vanished, removing services.
Mar  7 14:45:05 Tower  emhttpd: Stopping mover...
Mar  7 14:45:05 Tower  emhttpd: shcmd (9923960): /usr/local/sbin/mover stop
Mar  7 14:45:05 Tower root: mover: not running
Mar  7 14:45:05 Tower  emhttpd: Sync filesystems...
Mar  7 14:45:05 Tower  emhttpd: shcmd (9923961): sync
Mar  7 14:45:06 Tower ProFTPd: Running unmountscript.sh...

I checked the log after startup, and can't see anything related to the array until this entry:

Mar  7 20:12:03 Tower Parity Check Tuning: DEBUG:   Automatic Correcting Parity-Check running

-C- · March 11, 2023

1 hour ago, itimpi said:

I see you have the Parity Check Tuning plugin installed. This plugin replaces the built-in code for displaying history so it could be a problem there that is giving the blank dialog box.

To allow me to check this out to see if I can reproduce your symptoms could you let me have:

version number of Unraid you are using

version number of the plugin you are using

a copy of the config/parity-checks.log file from the flash drive

I'm running Unraid 6.11.1
Can't see a version number, but it's dated 2023.03.01
Here are all the entries from parity-checks.log

2022 Nov 11 01:35:22|2|0|-4|0|recon P|17578328012
2022 Nov 12 20:18:51|120861|148933137|0|0|recon P|17578328012
2022 Nov 30 22:54:57|128424|140162336|0|0|check P|17578328012
2022 Dec  7 19:33:32|3|0|-4|0|recon Q|19531792332
2022 Dec  8 19:05:46|266984|74.9MB/s|0|0|recon Q|269251|2|AUTOMATIC Parity Sync/Data Rebuild
2022 Dec 10 11:47:13|141961|140887676|0|0|check Q|19531792332
2022 Dec 11 17:04:58|95747|208.9MB/s|0|0|clear|95747|1|AUTOMATIC Disk Clear
2022 Dec 12 19:06:47|116|0|-4|0|recon P|19531792332
2022 Dec 13 22:37:14|2|0|-4|0|recon P|19531792332
2022 Dec 15 00:57:26|252391|79.2MB/s|0|0|recon P|252760|2|AUTOMATIC Parity Sync/Data Rebuild
2022 Dec 21 18:45:10|171468|116.6MB/s|0|2|check P Q|171468|1|MANUAL  Correcting Parity Check
2022 Dec 25 11:04:39|328646|60.9MB/s|0|2|check P Q|338803|2|MANUAL  Non-Correcting Parity Check
2022 Dec 31 00:42:17|2786|0|-4|0|check P Q|19531825100
2023 Jan  1 17:58:32|148553|134.6MB/s|0|2|check P Q|148553|1|MANUAL  Correcting Parity Check
2023 Jan  3 12:49:22|148056|135.1MB/s|0|2|check P Q|148056|1|MANUAL  Correcting Parity Check
2023 Jan  6 05:02:08|423315|47.2MB/s|0|2|check P Q|423648|2|MANUAL  Non-Correcting Parity Check
2023 Jan  7 00:25:47|19|0|-4|0|check P|19531825100
2023 Jan  8 16:31:21|144317|138587893|0|2|check P|19531825100
2023 Jan 10 13:02:42|142749|140.1MB/s|0|0|check P|142749|1|MANUAL  Correcting Parity Check
2023 Jan 10 13:21:39|130|0|-4|0|recon Q|19531825100
2023 Jan 12 17:17:53|60312|331.6MB/s|0|0|recon Q|60312|1|AUTOMATIC Parity Sync/Data Rebuild
2023 Jan 31 06:25:45|145405|137550902|0|2|check P Q|19531825100
2023 Feb  2 19:41:30|153414|130.4MB/s|0|0|check P Q|153414|1|MANUAL  Correcting Parity Check
2023 Mar  8 12:13:04|24865|0|-4|0|check P Q|19531825100
2023 Mar 10 18:09:37|148148|135.0 MB/s|0|2|check P Q|148148|1|Manual Correcting Parity-Check

-C- · July 8, 2023

My issue with the 2 errors being found during parity check remains.

I've now got a failing drive and have a new one to replace it with. I've successfully moved everything off the old drive.

I had an unclean shutdown recently and Unraid came back up it ran an automatic correcting check which finished today and this is the result from the log:

Jul 8 03:18:43 Tower Parity Check Tuning: DEBUG: Automatic Correcting Parity-Check running 
Jul 8 03:19:25 Tower kernel: md: recovery thread: P corrected, sector=39063584664
Jul 8 03:19:25 Tower kernel: md: recovery thread: P corrected, sector=39063584696 
Jul 8 03:19:25 Tower kernel: md: sync done. time=1844sec 
Jul 8 03:19:25 Tower kernel: md: recovery thread: exit status: 0

The problem is with the same 2 sectors on parity P that have been coming up as bad since the middle of December, but not always:

Both parity drives completed their SMART short self-tests without error.

I'm unsure how best to proceed. As my largest data disk is 18TB, the parities are 20TB and these 2 problem sectors are right at the end of the 20TB, so outside the area with data and I have moved all of the data off the disk that I want to replace, do I just ignore the parity errors, then follow this guide: https://docs.unraid.net/unraid-os/manual/storage-management#replacing-a-disk-to-increase-capacity or is there something else I can try?

tower-diagnostics-20230708-1356.zip

JorgeB · July 9, 2023

16 hours ago, -C- said:

do I just ignore the parity errors

I would for the replacement, then and since the errors are on P only I would try a different disk there, or swap P with Q and see if the error follows the disk.

trurl · July 9, 2023

Since when does unclean shutdown run a correcting parity check?

trurl · July 9, 2023

Just now, trurl said:

Since when does unclean shutdown run a correcting parity check?

Is that a "feature" of Parity Check Tuning plugin?

JorgeB · July 9, 2023

33 minutes ago, trurl said:

Is that a "feature" of Parity Check Tuning plugin?

Not sure, @itimpi?

itimpi · July 9, 2023

1 hour ago, trurl said:

Is that a "feature" of Parity Check Tuning plugin?

No. The parity check tuning plugin never initiates a parity check from the beginning, it relies on Unraid to do that and the plugin then handles pause/resume.

The only time the plugin initiates anything is when there was an array operation in progress at the time of the shutdown AND you have set the option to restart operations from point reached AND the shutdown was a clean shutdown. Even then whether it is correcting or non-correcting will depend on what it was before the shutdown.

Starting a correcting parity check after an unclean shutdown looks like new behaviour at the Unraid level. I am sure this check used to be non-correcting so not sure if this is a bug or is by design.

JorgeB · July 9, 2023

34 minutes ago, itimpi said:

Starting a correcting parity check after an unclean shutdown looks like new behaviour at the Unraid level.

I didn't know this changed, but after an unclean shutdown I do prefer a correcting check, since some sync errors are usually normal, might as well correct them on the first pass.

Kilrah · July 9, 2023

Is it really correcting? I seem to remember that at some point it would say it was but wasn't

trurl · July 9, 2023

2 hours ago, JorgeB said:

I didn't know this changed, but after an unclean shutdown I do prefer a correcting check, since some sync errors are usually normal, might as well correct them on the first pass.

If you really want it to correct, you can just cancel the non-correcting check and manually run it as correcting.

I can imagine scenarios (bad RAM?) that might result in unclean shutdown where you wouldn't want it to change parity.

JorgeB · July 10, 2023

15 hours ago, trurl said:

you can just cancel the non-correcting check and manually run it as correcting.

Yeah, but most users won't know about that, so they will wait for it to finish then run another one, or assume that the errors were corrected, though there are good arguments for doing either way.

-C- · July 11, 2023

On 7/10/2023 at 8:20 AM, JorgeB said:

most users won't know about that

I'm still not 100% sure about what's going on with all this 😜

Here's an update with what happened. I followed the guide to replace the failing disk. The rebuild onto the new disk appears to have gone well with no errors reported:

and

image.png.2b9bb36de4e4ab75a6351994e43320d1.png

What's strange is that there's nothing in the logs at the 10:00 timestamp that the parity result shows as the rebuild end time:

Jul 10 06:45:11 Tower emhttpd: spinning down /dev/sde
Jul 10 09:15:08 Tower autofan: Highest disk temp is 43C, adjusting fan speed from: 230 (90% @ 833rpm) to: 205 (80% @ 854rpm)
Jul 10 09:20:14 Tower autofan: Highest disk temp is 44C, adjusting fan speed from: 205 (80% @ 868rpm) to: 230 (90% @ 834rpm)
Jul 10 09:39:17 Tower emhttpd: read SMART /dev/sdh
Jul 10 09:59:53 Tower webGUI: Successful login user root from 192.168.34.42
Jul 10 10:00:43 Tower kernel: md: sync done. time=132325sec
Jul 10 10:00:43 Tower kernel: md: recovery thread: exit status: 0
Jul 10 10:05:23 Tower autofan: Highest disk temp is 43C, adjusting fan speed from: 230 (90% @ 869rpm) to: 205 (80% @ 907rpm)
Jul 10 10:09:42 Tower emhttpd: spinning down /dev/sdh
Jul 10 10:14:57 Tower webGUI: Successful login user root from 192.168.34.42
Jul 10 10:15:29 Tower autofan: Highest disk temp is 42C, adjusting fan speed from: 205 (80% @ 869rpm) to: 180 (70% @ 854rpm)
Jul 10 10:30:00 Tower webGUI: Successful login user root from 192.168.34.42
Jul 10 10:30:34 Tower autofan: Highest disk temp is 41C, adjusting fan speed from: 180 (70% @ 850rpm) to: 155 (60% @ 853rpm)
Jul 10 10:30:44 Tower emhttpd: spinning down /dev/sdg

I can see this in the log when the rebuild starts:

Jul  8 21:17:28 Tower Parity Check Tuning: DEBUG:   Parity Sync/Data Rebuild running
Jul  8 21:17:28 Tower Parity Check Tuning: Parity Sync/Data Rebuild detected
Jul  8 21:17:28 Tower Parity Check Tuning: DEBUG:   Created cron entry for 6 minute interval monitoring

Then I get the update every 6 minutes as expected:

Jul  9 02:24:34 Tower Parity Check Tuning: DEBUG:   Parity Sync/Data Rebuild running
Jul  9 02:30:20 Tower Parity Check Tuning: DEBUG:   Parity Sync/Data Rebuild running
Jul  9 02:36:33 Tower Parity Check Tuning: DEBUG:   Parity Sync/Data Rebuild running

Until here:

Jul  9 02:42:20 Tower Parity Check Tuning: DEBUG:   Parity Sync/Data Rebuild running
Jul  9 02:42:20 Tower Parity Check Tuning: DEBUG:   detected that mdcmd had been called from sh with command mdcmd nocheck PAUSE

Which happens a couple of minutes after this:

Jul  9 02:40:01 Tower root: mover: started

There are no further parity related entries after that.

I'm not sure whether I can consider things OK now, or whether I should be investigating further.

itimpi · July 11, 2023

1 hour ago, -C- said:

What's strange is that there's nothing in the logs at the 10:00 timestamp that the parity result shows as the rebuild end time:

Yes there is - there are the standard Unraid messages when an array operation completes that look like this:

Jul 10 10:00:43 Tower kernel: md: sync done. time=132325sec
Jul 10 10:00:43 Tower kernel: md: recovery thread: exit status: 0

The messages that look like this:

Jul  8 21:17:28 Tower Parity Check Tuning: DEBUG:   Parity Sync/Data Rebuild running
Jul  8 21:17:28 Tower Parity Check Tuning: Parity Sync/Data Rebuild detected
Jul  8 21:17:28 Tower Parity Check Tuning: DEBUG:   Created cron entry for 6 minute interval monitoring

are from the parity check tuning plugin that is not a standard part of Unraid. This currently has an issue that I do not understand where the monitor task seems to stop running and I do not know why. The latest version of the plugin is 2023-07-08 but I suspect you did not have that installed at the time.

-C- · July 12, 2023

Thanks Dave- that makes things clearer. If only the standard messages were as descriptive as the Parity Check Tuning ones.

I check in on my server most days and try to stay on top of app & plugin updates as soon as they become available. The Parity Check Tuning plugin is indeed on 2023.07.08 and I believe it was updated before I replaced the disk, but not certain.

Good luck with finding the cause of the stopping monitoring task. In my case all seemed good until the daily mover operation started.

itimpi · July 13, 2023

8 hours ago, -C- said:

Good luck with finding the cause of the stopping monitoring task. In my case all seemed good until the daily mover operation started.

The (undesirable) side-effect of the monitor task stopping running is that if you have the plugin settings set to pause a parity check if you have mover or appdata backup running then when either of those is detected you are likely to get the plugin executing the pause but then not doing the resume so you need a manual resume to continue.

-C- · July 13, 2023

4 hours ago, itimpi said:

if you have mover or appdata backup running then when either of those is detected you are likely to get the plugin executing the pause but then not doing the resume so you need a manual resume to continue.

I have both of those running daily and although the PCT log entries stopped just after the mover started, the actual rebuild continued and completed seemingly successfully without any interaction on my part.

Moving parity drives from external USB to (shucked) internal SATA

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

itimpi

-C-

trurl

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation