Jump to content

[SOLVED] parity check errors


pmpj218

Recommended Posts

Monthly parity check ran on the 1st at midnight and came back with 5 errors. Can someone let me know what these errors mean and if I need to do anything to remedy? Here's what the syslog shows during the check. Please let me know if I need to provide any additional info. Thanks!

 

Sep  1 00:00:02 unraid kernel: mdcmd (486): check NOCORRECT (unRAID engine)
Sep  1 00:00:02 unraid kernel:  (Routine)
Sep  1 00:00:02 unraid kernel: md: recovery thread woken up ... (unRAID engine)
Sep  1 00:00:02 unraid kernel: md: recovery thread checking parity... (unRAID engine)
Sep  1 00:00:02 unraid kernel: md: using 1152k window, over a total of 1953514552 blocks. (unRAID engine)
Sep  1 00:00:12 unraid kernel: md: parity incorrect: 19832 (Errors)
Sep  1 00:00:12 unraid kernel: md: parity incorrect: 19880 (Errors)
Sep  1 00:00:12 unraid kernel: md: parity incorrect: 19920 (Errors)
Sep  1 00:00:12 unraid kernel: md: parity incorrect: 19928 (Errors)
Sep  1 00:30:01 unraid crond[1188]: ignoring /var/spool/cron/crontabs/root- (non-existent user) 
Sep  1 01:20:01 unraid crond[1188]: ignoring /var/spool/cron/crontabs/root- (non-existent user) 
Sep  1 02:10:01 unraid crond[1188]: ignoring /var/spool/cron/crontabs/root- (non-existent user) 
Sep  1 02:46:00 unraid afpd[31180]: bad function 7A
Sep  1 02:46:00 unraid afpd[31180]: AFP3.3 Login by dana
Sep  1 02:48:27 unraid afpd[31180]: afp_alarm: child timed out, entering disconnected state
Sep  1 03:00:01 unraid crond[1188]: ignoring /var/spool/cron/crontabs/root- (non-existent user) 
Sep  1 03:12:29 unraid kernel: md: parity incorrect: 1821402704 (Errors)
Sep  1 03:40:01 unraid logger: mover started
Sep  1 03:40:01 unraid logger: moving Movies/
Sep  1 03:40:01 unraid logger: ./Movies/.AppleDB
Sep  1 03:40:01 unraid logger: .d..t...... ./
Sep  1 03:40:01 unraid logger: .d..t...... Movies/
Sep  1 03:40:01 unraid logger: .d..t...... Movies/.AppleDB/
Sep  1 03:40:01 unraid logger: ./Movies/
Sep  1 03:40:01 unraid logger: .d..t...... Movies/
Sep  1 03:40:02 unraid logger: moving TV/
Sep  1 03:40:02 unraid logger: ./TV/.AppleDB
Sep  1 03:40:02 unraid logger: .d..t...... ./
Sep  1 03:40:02 unraid logger: .d..t...... TV/
Sep  1 03:40:02 unraid logger: .d..t...... TV/.AppleDB/
Sep  1 03:40:02 unraid logger: ./TV/
Sep  1 03:40:02 unraid logger: .d..t...... TV/
Sep  1 03:40:02 unraid logger: mover finished
Sep  1 03:50:01 unraid crond[1188]: ignoring /var/spool/cron/crontabs/root- (non-existent user) 
Sep  1 04:40:01 unraid crond[1188]: ignoring /var/spool/cron/crontabs/root- (non-existent user) 
Sep  1 04:46:17 unraid afpd[3010]: bad function 7A
Sep  1 04:46:21 unraid afpd[3010]: AFP3.3 Login by dana
Sep  1 04:46:21 unraid afpd[3010]: afp_disconnect: trying primary reconnect
Sep  1 04:46:21 unraid afpd[1731]: Reconnect: transfering session to child[31180]
Sep  1 04:46:21 unraid afpd[1731]: Reconnect: killing new session child[3010] after transfer (Minor Issues)
Sep  1 04:46:21 unraid afpd[31180]: afp_dsi_transfer_session: succesfull primary reconnect
Sep  1 04:46:21 unraid afpd[31180]: AFP Replay Cache match: id: 75 / cmd: AFP_GETVOLPARAM
Sep  1 04:46:23 unraid afpd[3010]: afp_disconnect: primary reconnect succeeded
Sep  1 04:48:29 unraid afpd[31180]: afp_alarm: child timed out, entering disconnected state
Sep  1 05:30:01 unraid crond[1188]: ignoring /var/spool/cron/crontabs/root- (non-existent user) 
Sep  1 06:20:01 unraid crond[1188]: ignoring /var/spool/cron/crontabs/root- (non-existent user) 
Sep  1 06:46:32 unraid afpd[5617]: bad function 7A
Sep  1 06:46:35 unraid afpd[5617]: AFP3.3 Login by dana
Sep  1 06:46:36 unraid afpd[5617]: afp_disconnect: trying primary reconnect
Sep  1 06:46:36 unraid afpd[1731]: Reconnect: transfering session to child[31180]
Sep  1 06:46:36 unraid afpd[1731]: Reconnect: killing new session child[5617] after transfer (Minor Issues)
Sep  1 06:46:36 unraid afpd[31180]: afp_dsi_transfer_session: succesfull primary reconnect
Sep  1 06:46:38 unraid afpd[5617]: afp_disconnect: primary reconnect succeeded
Sep  1 06:49:02 unraid afpd[31180]: afp_alarm: child timed out, entering disconnected state
Sep  1 07:10:01 unraid crond[1188]: ignoring /var/spool/cron/crontabs/root- (non-existent user) 
Sep  1 08:00:01 unraid crond[1188]: ignoring /var/spool/cron/crontabs/root- (non-existent user) 
Sep  1 08:06:48 unraid afpd[17998]: bad function 7A
Sep  1 08:06:48 unraid afpd[17998]: AFP3.3 Login by dana
Sep  1 08:06:48 unraid afpd[17998]: afp_disconnect: trying primary reconnect
Sep  1 08:06:48 unraid afpd[1731]: Reconnect: transfering session to child[31180]
Sep  1 08:06:48 unraid afpd[1731]: Reconnect: killing new session child[17998] after transfer (Minor Issues)
Sep  1 08:06:48 unraid afpd[31180]: afp_dsi_transfer_session: succesfull primary reconnect
Sep  1 08:06:48 unraid afpd[31180]: AFP Replay Cache match: id: 217 / cmd: AFP_CLOSEVOL
Sep  1 08:06:49 unraid afpd[31180]: AFP logout by dana
Sep  1 08:06:49 unraid afpd[31180]: AFP statistics: 7.03 KB read, 10.58 KB written
Sep  1 08:06:49 unraid afpd[31180]: done
Sep  1 08:06:50 unraid afpd[17998]: afp_disconnect: primary reconnect succeeded
Sep  1 08:07:25 unraid afpd[18822]: AFP3.3 Login by dana
Sep  1 08:07:46 unraid cnid_dbd[19251]: Set syslog logging to level: LOG_NOTE
Sep  1 08:13:38 unraid kernel: md: sync done. time=29616sec (unRAID engine)
Sep  1 08:13:38 unraid kernel: md: recovery thread sync completion status: 0 (unRAID engine)

Link to comment

The syslog is showing that you have several parity sync errors.  If your server had a successful parity check, and then, before any reboot, you ran another parity check and it had sync errors, this is a serious problem.  In order to provide its protection, unRaid must be able to maintain parity accurately.  If it can't, you will be be able to recover from a drive failure.

 

There a few reasons that come time mind for parity getting out of whack:

 

1 - As already mentioned, a hard power down / restart will cause parity sync errors.  This is by far the most common cause.

 

2 - Upsizing / replacing a disk under 4.7 can cause this to occur.  A bug fix was made in a recent 5.0 beta for this.  You said you are running a recent beta, so this should not be the cause.

 

3 - Bad or misconfigured RAM.  Parity calculations can be corrupted if RAM is bad.  This can happen on the parity update (i.e., during the write) or can happen on the parity read (i.e., during the parity check).

 

4 - Bad or marginal data or power connections to the disks.  Resecuring data and power cable connections is quick and easy to do, and has been known to solve all sorts of problems.

 

5 - Underpowered or bad PSU.  Good steady power is needed to read and write data to your disks.  (Very hard to isolate a power issue without swapping out the PSU.)

 

6 - You mention that you have had some power fluctuations, but are using a UPS which has kept the server running.  This is as it should be.  But I can't rule out some power-related issue occuring during one of the power losses. 

 

First thing I would do is powerdown, check / resecure your drive cabling, and reseat your RAM modules.  On reboot, go into the BIOS and reconfirm / correct the memory parameters.  Then boot unRAID and run a couple of very short non-correcting parity checks.  All but one of your sync errors occurred very early on the array, so within a minute (or even a few seconds) you should see them repeat (you can then stop the parity check).  Run it 10 times.  Check the syslog and compare the block #s.  They should be identical on each run.

 

Run the othernight memory test.

 

If your parity sync errors are consistent, and your memory test shows no errors after an overnight run, run the correcting parity check. 

 

Then run parity checks every night for a few days.  Use the array during the day as per normal.

 

If you don't get more parity sync errors, I'd still run the parity checks weekly for 4-6 weeks to gain confidence that the array is maintaining parity.  If parity is still being maintained after all this testing, then it begins to look like either your resecuring of the cabling fixed things or the problem had something to do with your UPS.

 

Post back on your progress.

 

Good luck!

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...