xrqp Posted April 25, 2022 Share Posted April 25, 2022 Should I just ignore the parity check saying there are errors? tower-diagnostics-20220424-1703.zip I saw the little warning box saying I had 300 errors (rounding off) after parity check. Is there a log or file I can see those errors listed? Where? Quote Link to comment
trurl Posted April 25, 2022 Share Posted April 25, 2022 53 minutes ago, xrqp said: Should I just ignore the parity check saying there are errors? Exactly zero is the only acceptable result. If parity is out-of-sync how can you expect to accurately rebuild a failed disk? Unclean shutdown will often cause a small number of sync errors, but looks like the current bootup was clean. Perhaps you had an unclean shutdown earlier which resulted in these sync errors and you had not corrected them yet. I see this was a correcting parity check that started at midnight on the 20th, so I assume this was a scheduled check. Scheduled checks should be non-correcting. You should only do a correcting parity check after you have already determined there are parity sync errors, and you are confident there isn't some other cause such as disk problems. You don't want some hardware issue to change and possibly corrupt parity. After correcting parity sync errors, you should run a non-correcting parity check, without rebooting, to confirm there are now exactly zero parity sync errors remaining. If there are still parity errors then you must have some other problem causing that, and comparing the parity checks in syslog could help figure that out. Quote Link to comment
xrqp Posted April 25, 2022 Author Share Posted April 25, 2022 So syslog is the place I can see the errors listed? I got syslog by going to "Tools" then "syslog". Is syslog also in diagnostic.zip? I can not find it in diaglnostic.zip. In syslog it says: I had about 30 red errors for "no such file or directory". Those do not worry me. Not like a disk error. Those are porobably because I setup Sonarr imperfectly. I also had about 30 green logs saying "Tower kernel: md: recovery thread: P corrected, sector=4015656792". Each error with a different sector number. Based on your reply, I clicked on "scheduler", then changed "Write corrections to parity disk" from yes to no. (Unraid should give better help on that one). I do auto parity checks every 3 months. I guess I am ok then, and can just wait until the next auto parity check in 3 months. Thanks for your reply and help. Quote Link to comment
trurl Posted April 25, 2022 Share Posted April 25, 2022 1 minute ago, xrqp said: Is syslog also in diagnostic.zip? I can not find it in diaglnostic.zip. In the logs folder of the diagnostics zip 2 minutes ago, xrqp said: I guess I am ok No 58 minutes ago, trurl said: After correcting parity sync errors, you should run a non-correcting parity check, without rebooting, to confirm there are now exactly zero parity sync errors remaining. If there are still parity errors then you must have some other problem causing that, and comparing the parity checks in syslog could help figure that out Quote Link to comment
xrqp Posted April 25, 2022 Author Share Posted April 25, 2022 OK. I am running parity check with no corrections now. Takes about 2 days. I did not correct any errors because I could not find any. thanks again. Quote Link to comment
trurl Posted April 25, 2022 Share Posted April 25, 2022 5 hours ago, xrqp said: I did not correct any errors because I could not find any. Not sure what you mean there. Do you mean you didn't notice any hardware problems? I didn't either but I didn't look through all of those diagnostics. Did any of your disks have non-zero in the Errors column on Main? Do any of your disks have SMART warnings on the Dashboard page? Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? Quote Link to comment
xrqp Posted April 27, 2022 Author Share Posted April 27, 2022 At this moment April 26 after the last parity check, on main page the column for "errors" all are zeros, the dashboars shows "healthy" for all disks, and I attached a new diagnostics file. tower-diagnostics-20220426-1935.zip When i wrote on April 25th that I could not find errors, it was based on checking syslog, and also checking SMART report for every disk. Based on emails below, it looks like this happened over the week: Parity check - result 4/21 - 278 errors 4/25 - pass 4/26 - 256 errors. I was mixed-up before when I started this thread - I thought the 278 errors were on 4/25. Here is the last 2 emails I got when I ran parity checks. April 25 ran w/correction. April 26, ran w/o correction. The April 25 email included info on the previous parity check on April 21 with 278 errors. EMAIL Sent: Monday, April 25, 2022 12:20 AM Subject: Unraid Status: Notice [TOWER] - array health report [PASS] Event: Unraid Status Subject: Notice [TOWER] - array health report [PASS] Description: Array has 9 disks (including parity & cache) Importance: normal Parity - ST18000NM000J-2TV103_ZR52HS8A (sdg) - active 40 C [OK] Disk 1 - ST18000NM000J-2TV103_ZR52VKR5 (sdk) - active 37 C [OK] Disk 2 - ST18000NM000J-2TV103_ZR52BE8Q (sdi) - standby [OK] Disk 3 - ST8000DM004-2CX188_ZCT06E5C (sdd) - standby [OK] Disk 4 - ST8000DM004-2CX188_WCT0VFC8 (sdh) - active 31 C [OK] Disk 5 - ST8000DM004-2CX188_ZR119PYR (sde) - active 30 C [OK] Disk 6 - ST18000NM000J-2TV103_ZR52HRY7 (sdf) - standby [OK] Disk 7 - WDC_WD80EFAX-68LHPN0_7HKBDRTF (sdj) - standby [OK] Cache - CT1000P1SSD8_1842E1D21EFE (nvme0n1) - active 38 C [OK] Parity is valid Last checked on Thu 21 Apr 2022 10:50:21 AM PDT (4 days ago), finding 278 errors. Duration: 1 day, 10 hours, 50 minutes, 20 seconds. Average speed: 143.5 MB/s EMAIL Sent: Tuesday, April 26, 2022 1:30 PM Subject: Unraid Status: Notice [TOWER] - Parity check finished (256 errors) Importance: High Event: Unraid Parity check Subject: Notice [TOWER] - Parity check finished (256 errors) Description: Duration: 1 day, 12 hours, 27 minutes, 50 seconds. Average speed: 137.1 MB/s Importance: warning Quote Link to comment
trurl Posted April 27, 2022 Share Posted April 27, 2022 11 hours ago, xrqp said: 4/21 - 278 errors This was a correcting check Spoiler Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656792 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656800 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656808 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656816 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656824 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656832 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656840 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656848 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656856 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656864 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656872 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656880 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656888 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656896 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656904 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656912 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656920 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656928 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656936 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656944 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656952 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656960 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656968 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656976 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656984 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656992 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657000 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657008 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657016 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657024 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657032 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657040 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657048 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657056 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657064 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657072 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657080 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657088 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657096 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657104 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657112 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657120 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657128 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657136 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657144 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657152 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657160 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657168 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657176 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657184 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657192 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657200 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657208 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657216 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657224 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657232 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657240 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657248 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657256 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657264 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657272 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657280 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657288 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657296 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657304 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657312 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657320 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657328 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657336 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657344 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657352 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657360 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657368 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657376 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657384 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657392 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657400 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657408 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657416 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657424 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657432 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657440 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657448 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657456 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657464 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657472 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657480 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657488 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657496 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657504 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657512 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657520 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657528 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657536 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657544 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657552 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657560 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657568 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657576 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015657584 Apr 20 03:28:08 Tower kernel: md: recovery thread: stopped logging 11 hours ago, xrqp said: 4/25 - pass I don't see this parity check in syslog, where are you seeing it? Screenshot? Perhaps you meant the Array Health email you received on 25th, which did say PASS, but also showed the parity sync errors from the check that finished on 21st. There was this noncorrecting check that started on 4/25, but it was the one that completed on 4/26. 11 hours ago, xrqp said: 4/26 - 256 errors. Spoiler Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657608 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657616 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657624 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657632 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657640 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657648 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657656 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657664 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657672 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657680 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657688 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657696 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657704 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657712 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657720 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657728 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657736 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657744 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657752 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657760 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657768 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657776 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657784 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657792 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657800 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657808 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657816 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657824 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657832 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657840 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657848 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657856 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657864 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657872 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657880 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657888 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657896 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657904 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657912 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657920 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657928 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657936 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657944 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657952 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657960 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657968 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657976 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657984 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657992 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658000 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658008 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658016 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658024 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658032 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658040 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658048 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658056 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658064 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658072 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658080 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658088 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658096 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658104 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658112 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658120 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658128 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658136 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658144 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658152 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658160 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658168 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658176 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658184 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658192 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658200 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658208 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658216 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658224 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658232 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658240 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658248 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658256 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658264 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658272 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658280 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658288 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658296 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658304 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658312 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658320 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658328 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658336 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658344 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658352 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658360 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658368 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658376 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658384 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658392 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015658400 Apr 25 06:07:49 Tower kernel: md: recovery thread: stopped logging If you compare those (do the reveal), different sectors were incorrect each time. This strongly suggests BAD RAM. Have you done memtest? Quote Link to comment
xrqp Posted April 27, 2022 Author Share Posted April 27, 2022 On 4/25 email, I did think Health check was the same as parity check. I realize now they are different. Thanks. I will check forum for how to do memtest in unraid. Could errors be caused by too much going on at one time? I may have been transferring files from Wind 10 PC to unraid, and downloading tv shows from antenna, and downloading from sonarr and sabnzbd thru web, and running roon docker. Should I run a correcting parity check? It is hard to do parity check without corrections as there are 2 places it is set to correct. In the scheduler I changed it to not correct, but in the main window, near the bottom, next to Parity check, there is a little check box "correcting". I uncheck it, but when the screen refreshes, the check comes back by itself. Quote Link to comment
itimpi Posted April 27, 2022 Share Posted April 27, 2022 Just now, xrqp said: Could errors be caused by too much going on at one time? I may have been transferring files from Wind 10 PC to unraid, and downloading tv shows from antenna, and downloading from sonarr and sabnzbd thru web, and running roon docker. It should not make any difference, but perhaps if the system is heavily loaded something like power supply loading might be more of an issue. Quote Link to comment
xrqp Posted April 27, 2022 Author Share Posted April 27, 2022 I will see if I can put a watt meter on and stress the workload, and see if I exceed my power supply, which i think is 400 watt (need to check if 350 or 400 watt PS). When I had 5 drives it peaked at about 90 watts and idled at 35 watts, but I did not try hard to max out CPU. Quote Link to comment
xrqp Posted April 27, 2022 Author Share Posted April 27, 2022 Should I run a correcting parity check? It is hard to do parity check without corrections as there are 2 places it is set to correct. In the scheduler I changed it to not correct, but in the main window, near the bottom, next to Parity check, there is a little check box "correcting". I uncheck it, but when the screen refreshes, the check comes back by itself. Quote Link to comment
JonathanM Posted April 27, 2022 Share Posted April 27, 2022 If 2 consecutive non- correcting checks come back with different error addresses, you shouldn't run a correcting check until you figure out why you are getting errors and correct the issue. As was already stated, the most common cause for this type of thing is unreliable RAM, but that can be caused by unstable power, or a number of other issues. Until you get back to back identical checks you still have an issue you need to sort out by any means possible. This also means that you can't rely on the data you read from the array being correct, and you especially should avoid writing to the array, who knows if the data you write will be accurately stored. You must get to the point of repeatable parity checks with zero errors before you can trust the server again. Quote Link to comment
trurl Posted April 28, 2022 Share Posted April 28, 2022 You shouldn't even attempt to run a computer if the RAM is bad. Everything goes through RAM, the OS code and any other programs, your data, everything. The CPU can't execute any instructions until they are loaded into RAM, and can't do anything with any data until it is loaded into RAM. After you get RAM tested and fixed, or determine that it is OK, don't correct parity, or indeed write to any disk, until you get identical consecutive non-correcting parity checks. After you get identical non-correcting parity checks, you can correct parity, then another non-correcting parity check to verify there are zero parity sync errors. Do you have backups of anything important and irreplaceable? You should even if everything is working well, which it isn't. Quote Link to comment
MrGrey Posted April 28, 2022 Share Posted April 28, 2022 On 4/24/2022 at 5:09 PM, xrqp said: Should I just ignore the parity check saying there are errors? No. There are errors. I'm a newbie, barely using Unraid for a year or two, but I've found its errors to be better than most (I've only used Windows and Linux [mostly Arch]). Do you trust Unraid (as an OS) to give errors when it should? I think you have a parity error (multiples don't matter). 7 hours ago, xrqp said: It is hard to do parity check without corrections as there are 2 places it is set to correct. Why would you parity check and not correct?... Teach me (if you can reduce yourself to my level). Mr. Grey Quote Link to comment
JonathanM Posted April 28, 2022 Share Posted April 28, 2022 5 hours ago, MrGrey said: Why would you parity check and not correct? Correcting involves reading the data disks and writing data to the parity drive. Until you have completely repeatable parity checks returning identical results on subsequent runs, 12 hours ago, JonathanM said: you can't rely on the data you read from the array being correct, and you especially should avoid writing to the array, who knows if the data you write will be accurately stored. Quote Link to comment
trurl Posted April 28, 2022 Share Posted April 28, 2022 6 hours ago, MrGrey said: Why would you parity check and not correct? Because problems with other disks or other hardware could corrupt parity if you allow it to be "corrected". In this particular thread, OP has sync errors that aren't repeatable, suggesting bad RAM or other issues. Quote Link to comment
trurl Posted April 28, 2022 Share Posted April 28, 2022 13 hours ago, xrqp said: It is hard to do parity check without corrections as there are 2 places it is set to correct. In Settings - Scheduler, that controls whether or not scheduled (not user initiated) parity checks are correcting. You have already set this to non-correcting. You can forget about scheduled checks until you get everything working well again. In fact, you might as well disable them for now. In Main - Array Operations, you can start a parity check (user initiated), and choose whether or not to correct parity errors. You should not correct parity until you get everything working well again. You said you run parity checks every 3 months, but you didn't say whether the previous scheduled check had zero sync errors. Possibly you have had a serious problem for some time now. 10 hours ago, trurl said: Do you have backups of anything important and irreplaceable? You should even if everything is working well, which it isn't. First thing is to test memory. Quote Link to comment
xrqp Posted April 29, 2022 Author Share Posted April 29, 2022 (edited) I read that I can run memtest from the boot menu. To see boot menu, i hook up monitor and keyboard then reboot. I will try to run memtest next week. I tried to figure out if I have UEFI or not, by looking for folder on flash drive for efi~ or efi, so I can make sure it is efi~ . but when I use Krusader, and go to Root/FLASH it says I cannot open that folder. Edited April 29, 2022 by xrqp Quote Link to comment
trurl Posted April 29, 2022 Share Posted April 29, 2022 15 hours ago, xrqp said: I will try to run memtest next week. So will you be shutting down until then? On 4/27/2022 at 10:09 PM, trurl said: You shouldn't even attempt to run a computer if the RAM is bad. Quote Link to comment
xrqp Posted April 29, 2022 Author Share Posted April 29, 2022 No, I was going to keep running it until Monday. But based on your reply, I will start memtest as soon as I get home tonight. Quote Link to comment
trurl Posted April 30, 2022 Share Posted April 30, 2022 On 4/27/2022 at 10:09 PM, trurl said: Everything goes through RAM, the OS code and any other programs, your data, everything. The CPU can't execute any instructions until they are loaded into RAM, and can't do anything with any data until it is loaded into RAM. What typically happens with bad RAM is some bits will be wrong, which makes everything else wrong. Wrong instructions don't just work poorly, they don't work at all. Wrong data can become permanently wrong when it is saved. Saved corrupt data could be the filesystem metadata, which makes it impossible to even retrieve the good data. Quote Link to comment
xrqp Posted May 1, 2022 Author Share Posted May 1, 2022 I ran memtest for about 24 hours. When I stopped it, it said "Pass:10 Errors:0". It was running test #9 when I stopped it, so I do not know what "Pass:10" means. I think I passed the memtest. So I am now running unraid again. So now what do I do? Quote Link to comment
JorgeB Posted May 1, 2022 Share Posted May 1, 2022 Strange that the sync errors were detected in a similar zone and they are all sequential, this makes not suspect RAM: 1st check Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656792 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656800 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656808 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656816 Apr 20 03:28:08 Tower kernel: md: recovery thread: P corrected, sector=4015656824 etc 2nd check Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657608 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657616 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657624 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657632 Apr 25 06:07:49 Tower kernel: md: recovery thread: P incorrect, sector=4015657640 etc Since you rebooted run a correcting check then a non correcting one and post new diags, if it's a disk issue it's much more difficult to identify the culprit. Quote Link to comment
xrqp Posted May 2, 2022 Author Share Posted May 2, 2022 (edited) I will run correcting parity check, get diagnostics, then run non correcting parity check and get diagnostics. Each parity check takes about 2 days. One problem I have is my internet is DSL on a very noisey copper wire phone line. I have voice phone (POTS) on the same line and it is so noisy we can not use it for voice anymore. Edited May 2, 2022 by xrqp Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.