Parity check says 300 errors, but disk logs show zero errors.


xrqp

Recommended Posts

53 minutes ago, xrqp said:

Should I just ignore the parity check saying there are errors?

Exactly zero is the only acceptable result. If parity is out-of-sync how can you expect to accurately rebuild a failed disk?

 

Unclean shutdown will often cause a small number of sync errors, but looks like the current bootup was clean. Perhaps you had an unclean shutdown earlier which resulted in these sync errors and you had not corrected them yet.

 

I see this was a correcting parity check that started at midnight on the 20th, so I assume this was a scheduled check. Scheduled checks should be non-correcting. You should only do a correcting parity check after you have already determined there are parity sync errors, and you are confident there isn't some other cause such as disk problems. You don't want some hardware issue to change and possibly corrupt parity.

 

After correcting parity sync errors, you should run a non-correcting parity check, without rebooting, to confirm there are now exactly zero parity sync errors remaining. If there are still parity errors then you must have some other problem causing that, and comparing the parity checks in syslog could help figure that out.

Link to comment

So syslog is the place I can see the errors listed?  

I got syslog by going to "Tools" then "syslog". 

Is syslog also in diagnostic.zip?  I can not find it in diaglnostic.zip.

 

In syslog it says:

  1. I had about 30 red errors for "no such file or directory".  Those do not worry me.  Not like a disk error.  Those are porobably because I setup Sonarr imperfectly.
  2. I also had about 30 green logs saying "Tower kernel: md: recovery thread: P corrected, sector=4015656792".  Each error with a different sector number.

Based on your reply, I clicked on "scheduler", then changed "Write corrections to parity disk" from yes to no.  (Unraid should give better help on that one).  I do auto parity checks every 3 months.  

 

I guess I am ok then, and can just wait until the next auto parity check in 3 months.  Thanks for your reply and help.

Link to comment
1 minute ago, xrqp said:

Is syslog also in diagnostic.zip?  I can not find it in diaglnostic.zip.

In the logs folder of the diagnostics zip

 

2 minutes ago, xrqp said:

I guess I am ok

No

58 minutes ago, trurl said:

After correcting parity sync errors, you should run a non-correcting parity check, without rebooting, to confirm there are now exactly zero parity sync errors remaining. If there are still parity errors then you must have some other problem causing that, and comparing the parity checks in syslog could help figure that out

 

Link to comment
5 hours ago, xrqp said:

I did not correct any errors because I could not find any.

Not sure what you mean there. Do you mean you didn't notice any hardware problems? I didn't either but I didn't look through all of those diagnostics.

 

Did any of your disks have non-zero in the Errors column on Main? Do any of your disks have SMART warnings on the Dashboard page? Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected?

Link to comment

At this moment April 26 after the last parity check, on main page the column for "errors" all are zeros, the dashboars shows "healthy" for all disks, and I attached a new diagnostics file.  tower-diagnostics-20220426-1935.zip 

 

When i wrote on April 25th that I could not find errors, it was based on checking syslog, and also checking SMART report for every disk. 

 

Based on emails below, it looks like this happened over the week:
Parity check  -  result

     4/21    -     278 errors

     4/25    -    pass

     4/26    -     256 errors.

I was mixed-up before when I started this thread - I thought the 278 errors were on 4/25.

 

Here is the last 2 emails I got when I ran parity checks.  April 25 ran w/correction. April 26, ran w/o correction. 

The April 25 email included info on the previous parity check on April 21 with 278 errors.

 

EMAIL Sent: Monday, April 25, 2022 12:20 AM
Subject: Unraid Status: Notice [TOWER] - array health report [PASS]

Event: Unraid Status

Subject: Notice [TOWER] - array health report [PASS]

Description: Array has 9 disks (including parity & cache)

Importance: normal

Parity - ST18000NM000J-2TV103_ZR52HS8A (sdg) - active 40 C [OK] Disk 1 - ST18000NM000J-2TV103_ZR52VKR5 (sdk) - active 37 C [OK] Disk 2 - ST18000NM000J-2TV103_ZR52BE8Q (sdi) - standby [OK] Disk 3 - ST8000DM004-2CX188_ZCT06E5C (sdd) - standby [OK] Disk 4 - ST8000DM004-2CX188_WCT0VFC8 (sdh) - active 31 C [OK] Disk 5 - ST8000DM004-2CX188_ZR119PYR (sde) - active 30 C [OK] Disk 6 - ST18000NM000J-2TV103_ZR52HRY7 (sdf) - standby [OK] Disk 7 - WDC_WD80EFAX-68LHPN0_7HKBDRTF (sdj) - standby [OK] Cache - CT1000P1SSD8_1842E1D21EFE (nvme0n1) - active 38 C [OK]

Parity is valid

Last checked on Thu 21 Apr 2022 10:50:21 AM PDT (4 days ago), finding 278 errors.

Duration: 1 day, 10 hours, 50 minutes, 20 seconds. Average speed: 143.5 MB/s

 

EMAIL Sent: Tuesday, April 26, 2022 1:30 PM
Subject: Unraid Status: Notice [TOWER] - Parity check finished (256 errors)
Importance: High

Event: Unraid Parity check

Subject: Notice [TOWER] - Parity check finished (256 errors)

Description: Duration: 1 day, 12 hours, 27 minutes, 50 seconds. Average speed: 137.1 MB/s

Importance: warning

Link to comment
11 hours ago, xrqp said:

 4/21    -     278 errors

This was a correcting check

Spoiler
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656792
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656800
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656808
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656816
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656824
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656832
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656840
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656848
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656856
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656864
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656872
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656880
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656888
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656896
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656904
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656912
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656920
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656928
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656936
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656944
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656952
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656960
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656968
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656976
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656984
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656992
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657000
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657008
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657016
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657024
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657032
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657040
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657048
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657056
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657064
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657072
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657080
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657088
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657096
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657104
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657112
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657120
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657128
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657136
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657144
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657152
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657160
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657168
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657176
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657184
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657192
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657200
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657208
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657216
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657224
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657232
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657240
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657248
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657256
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657264
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657272
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657280
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657288
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657296
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657304
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657312
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657320
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657328
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657336
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657344
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657352
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657360
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657368
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657376
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657384
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657392
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657400
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657408
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657416
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657424
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657432
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657440
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657448
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657456
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657464
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657472
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657480
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657488
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657496
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657504
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657512
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657520
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657528
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657536
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657544
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657552
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657560
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657568
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657576
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657584
Apr 20 03:28:08 Tower kernel: md: recovery thread: stopped logging

 

 

11 hours ago, xrqp said:

4/25    -    pass

I don't see this parity check in syslog, where are you seeing it? Screenshot? Perhaps you meant the Array Health email you received on 25th, which did say PASS, but also showed the parity sync errors from the check that finished on 21st.

 

There was this noncorrecting check that started on 4/25, but it was the one that completed on 4/26.

11 hours ago, xrqp said:

 4/26    -     256 errors.

Spoiler
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657608
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657616
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657624
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657632
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657640
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657648
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657656
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657664
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657672
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657680
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657688
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657696
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657704
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657712
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657720
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657728
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657736
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657744
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657752
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657760
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657768
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657776
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657784
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657792
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657800
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657808
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657816
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657824
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657832
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657840
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657848
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657856
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657864
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657872
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657880
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657888
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657896
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657904
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657912
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657920
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657928
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657936
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657944
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657952
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657960
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657968
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657976
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657984
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657992
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658000
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658008
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658016
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658024
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658032
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658040
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658048
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658056
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658064
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658072
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658080
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658088
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658096
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658104
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658112
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658120
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658128
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658136
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658144
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658152
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658160
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658168
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658176
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658184
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658192
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658200
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658208
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658216
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658224
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658232
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658240
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658248
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658256
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658264
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658272
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658280
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658288
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658296
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658304
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658312
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658320
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658328
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658336
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658344
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658352
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658360
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658368
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658376
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658384
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658392
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658400
Apr 25 06:07:49 Tower kernel: md: recovery thread: stopped logging

 

 

If you compare those (do the reveal), different sectors were incorrect each time.

 

This strongly suggests BAD RAM.

 

Have you done memtest?

Link to comment

On 4/25 email, I did think Health check was the same as parity check.  I realize now they are different.  Thanks.

I will check forum for how to do memtest in unraid.

 

Could errors be caused by too much going on at one time?  I may have been transferring files from Wind 10 PC to unraid, and downloading tv shows from antenna, and downloading from sonarr and sabnzbd thru web, and running roon docker.

 

Should I run a correcting parity check?

 

It is hard to do parity check without corrections as there are 2 places it is set to correct.  In the scheduler I changed it to not correct, but in the main window, near the bottom, next to Parity check, there is a little check box "correcting".  I uncheck it, but when the screen refreshes, the check comes back by itself.

Link to comment
Just now, xrqp said:

Could errors be caused by too much going on at one time?  I may have been transferring files from Wind 10 PC to unraid, and downloading tv shows from antenna, and downloading from sonarr and sabnzbd thru web, and running roon docker.


It should not make any difference, but perhaps if the system is heavily loaded something like power supply loading might be more of an issue.

Link to comment

I will see if I can put a watt meter on and stress the workload, and see if I exceed my power supply, which i think is 400 watt (need to check if 350 or 400 watt PS).  When I had 5 drives it peaked at about 90 watts and idled at 35 watts, but I did not try hard to max out CPU.  

Link to comment

Should I run a correcting parity check?

 

It is hard to do parity check without corrections as there are 2 places it is set to correct.  In the scheduler I changed it to not correct, but in the main window, near the bottom, next to Parity check, there is a little check box "correcting".  I uncheck it, but when the screen refreshes, the check comes back by itself.

Link to comment

If 2 consecutive non- correcting checks come back with different error addresses, you shouldn't run a correcting check until you figure out why you are getting errors and correct the issue.

 

As was already stated, the most common cause for this type of thing is unreliable RAM, but that can be caused by unstable power, or a number of other issues. Until you get back to back identical checks you still have an issue you need to sort out by any means possible.

 

This also means that you can't rely on the data you read from the array being correct, and you especially should avoid writing to the array, who knows if the data you write will be accurately stored.

 

You must get to the point of repeatable parity checks with zero errors before you can trust the server again.

Link to comment

You shouldn't even attempt to run a computer if the RAM is bad. Everything goes through RAM, the OS code and any other programs, your data, everything. The CPU can't execute any instructions until they are loaded into RAM, and can't do anything with any data until it is loaded into RAM.

 

After you get RAM tested and fixed, or determine that it is OK, don't correct parity, or indeed write to any disk, until you get identical consecutive non-correcting parity checks.

 

After you get identical non-correcting parity checks, you can correct parity, then another non-correcting parity check to verify there are zero parity sync errors.

 

Do you have backups of anything important and irreplaceable? You should even if everything is working well, which it isn't.

 

Link to comment
On 4/24/2022 at 5:09 PM, xrqp said:

Should I just ignore the parity check saying there are errors?

No.

 

There are errors.

 

I'm a newbie, barely using Unraid for a year or two, but I've found its errors to be better than most (I've only used Windows and Linux [mostly Arch]).

 

Do you trust Unraid (as an OS) to give errors when it should?

 

I think you have a parity error (multiples don't matter).

 

7 hours ago, xrqp said:

It is hard to do parity check without corrections as there are 2 places it is set to correct. 

 

Why would you parity check and not correct?... Teach me (if you can reduce yourself to my level).

 

Mr. Grey

 

Link to comment
5 hours ago, MrGrey said:

Why would you parity check and not correct?

Correcting involves reading the data disks and writing data to the parity drive. Until you have completely repeatable parity checks returning identical results on subsequent runs,

12 hours ago, JonathanM said:

you can't rely on the data you read from the array being correct, and you especially should avoid writing to the array, who knows if the data you write will be accurately stored.

 

Link to comment
6 hours ago, MrGrey said:

Why would you parity check and not correct?

Because problems with other disks or other hardware could corrupt parity if you allow it to be "corrected".

 

In this particular thread, OP has sync errors that aren't repeatable, suggesting bad RAM or other issues.

Link to comment
13 hours ago, xrqp said:

It is hard to do parity check without corrections as there are 2 places it is set to correct.

In Settings - Scheduler, that controls whether or not scheduled (not user initiated) parity checks are correcting. You have already set this to non-correcting. You can forget about scheduled checks until you get everything working well again. In fact, you might as well disable them for now.

 

In Main - Array Operations, you can start a parity check (user initiated), and choose whether or not to correct parity errors. You should not correct parity until you get everything working well again.

 

You said you run parity checks every 3 months, but you didn't say whether the previous scheduled check had zero sync errors. Possibly you have had a serious problem for some time now.

 

10 hours ago, trurl said:

Do you have backups of anything important and irreplaceable? You should even if everything is working well, which it isn't.

 

First thing is to test memory.

Link to comment

I read that I can run memtest from the boot menu.  To see boot menu, i hook up monitor and keyboard then reboot.  I will try to run memtest next week.

 

I tried to figure out if I have UEFI or not, by looking for folder on flash drive for efi~ or efi, so I can make sure it is efi~ .  but when I use Krusader, and go to Root/FLASH it says I cannot open that folder.

Edited by xrqp
Link to comment
On 4/27/2022 at 10:09 PM, trurl said:

Everything goes through RAM, the OS code and any other programs, your data, everything. The CPU can't execute any instructions until they are loaded into RAM, and can't do anything with any data until it is loaded into RAM.

What typically happens with bad RAM is some bits will be wrong, which makes everything else wrong. Wrong instructions don't just work poorly, they don't work at all. Wrong data can become permanently wrong when it is saved. Saved corrupt data could be the filesystem metadata, which makes it impossible to even retrieve the good data.

Link to comment

I ran memtest for about 24 hours.  When I stopped it, it said "Pass:10    Errors:0".   It was running test #9 when I stopped it, so I do not know what "Pass:10" means.   I think I passed the memtest.  So I am now running unraid again.  So now what do I do?

Link to comment

Strange that the sync errors were detected in a similar zone and they are all sequential, this makes not suspect RAM:

 

1st check


 

Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656792
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656800
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656808
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656816
Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656824

etc

 

2nd check

 

Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657608
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657616
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657624
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657632
Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657640
etc

 

Since you rebooted run a correcting check then a non correcting one and post new diags, if it's a disk issue it's much more difficult to identify the culprit.

Link to comment

I will run correcting parity check, get diagnostics, then run non correcting parity check and get diagnostics.  Each parity check takes about 2 days.

 

One problem I have is my internet is DSL on a very noisey copper wire phone line.  I have voice phone (POTS) on the same line and it is so noisy we can not use it for voice anymore.

Edited by xrqp
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.