SYSLOG full because of "read error"?


NLS
Go to solution Solved by JorgeB,

Recommended Posts

So my syslog got full, while rebuilding a disk from parity (as seen in my previous threads).

 

The rebuild is still around 50% and progressing without any report of issues in the GUI, although it did pop up about a single error (probably bad sector in parity?)... but from that point no further issues and if I didn't notice the log getting full I would think things are ok.

 

So the lines that fille up syslog are as follows:

Sep 28 10:17:59 <my server> kernel: md: disk0 read error, sector=2169270496


(with the sector keep changing in the every line)

Last entry (because it filled up) was 70 minutes ago (10:17:59 or something, local) and I am about 4 hours in the rebuild already.

 

So I looked to find the FIRST such entry in the log.

What I found was very interesting. The first entry in the log, more than 1.5 million lines above the last, WAS THE SAME MINUTE (10:17:06).

It actuallly "burst" 1.7 million lines in the same limit, so I double it correctly identifies the error.

(Is it realistic to find 1.7 million sectors with problem within 50 seconds? Plus who know how many more after log was full?)

 

Also I am not sure which is "disk0" as I don't have that anywhere. Is it the parity?

What is happening?

 

I post a truncated version of the log.

 

syslog.txt

Link to comment

...erm actually just noticed in the GUI...

 

Current operation started on Wednesday, 28-09-2022, 07:34 (today)
 Elapsed time: 4 hours, 51 minutes
 Estimated finish: 2 hours, 32 minutes
 Finding 219564171 errors

 

The number is a bit unrealistic. So, could again be a cable issue, and is that on parity?

Yes it is on parity as I also see this:

 

Parity	WDC_WD40EFRX-68N32N0_WD-WCC7K6SYT2RN - 4 TB (sdb)	*	493 984 963	5944	224 437 515	

 

So, about half the reads (!?) fail?

Any ideas?

 

Also is there any disk test appropriate for the parity?
(that I understand doesn't have a proper filesystem?)

 

Edited by NLS
Link to comment
15 minutes ago, JorgeB said:

 

ata6: SError: { UnrecovData 10B8B BadCRC }
 
 

This is usually the result of a bad SATA cable.
 

 

I am going to wait for the parity build to finish (it is more than 70% now). I am not close to the server anyway.

Or should I just stop it so it doesn't come online?


Assuming it is the cable, the proper procedure (after replacing) is what? How can I enforce to rebuild disk 9 from the start?
Delete whatever partition it created?

 

Also is there any disk check appropriate for parity? (before starting rebuild)

 

Link to comment

Seems it was indeed the cable.

Which SUCKS as for whoever followed my three latest threads, can see I got all kinds of cable errors.
And OK parity disk is on normal SATA cable that goes on mobo.
The others are 1-to-4 SAS/SATA cables that cannot be replaced very fast (need to be ordered from ebay etc.).

Anyway... now parity builds the emulated disk, with 0 errors until now (3.2%). Knock on wood.

(then will replace the parity with bigger new, rebuild, then replace the very old temp 3TB that builds right now, with new bigger and build YET again, then use the old 4TB parity to replace one more older 3TB data disk and yes... rebuild AGAIN)

Note: I never a reply on how someone could possibly (surface?) check the parity disk.

 

Edited by NLS
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.