Parity1 errors, started during parity check

Rich · September 2, 2021

Hi All,

Since it's last reboot (100ish days ago) my server has been running without any problems (that i've noticed), until yesterday when I saw a few errors appear during it's monthly parity check.

During the parity check, i noticed that parity drive 1 had some read errors. The syslog showed the below, which i assume is referring to the disk errors. This has since occurred multiple times which has increased the error count on the main unRAID page to the current total of 1618 for parity 1.

I have THIS case, which i have been using since February this year, without any issues.

Appreciate that there are likely several potential causes, e.g. cables, backplane or hard drive, however before i start taking things apart to test, i wanted to post here and see if the error indicates one cause to be more likely than the others and if there could be any likely causes that i have not listed above.

Any help is greatly appreciated.

Thanks.

Rich

Sep  1 14:20:01 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Sep  1 14:20:01 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Sep  1 14:20:01 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Sep  1 14:20:01 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Sep  1 14:20:01 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Sep  1 14:20:01 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Sep  1 14:20:01 unRAID-mini kernel: sd 2:0:5:0: [sdh] tag#9374 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=11s
Sep  1 14:20:01 unRAID-mini kernel: sd 2:0:5:0: [sdh] tag#9374 Sense Key : 0x3 [current] [descriptor] 
Sep  1 14:20:01 unRAID-mini kernel: sd 2:0:5:0: [sdh] tag#9374 ASC=0x11 ASCQ=0x0 
Sep  1 14:20:01 unRAID-mini kernel: sd 2:0:5:0: [sdh] tag#9374 CDB: opcode=0x88 88 00 00 00 00 02 ce 67 06 70 00 00 04 00 00 00
Sep  1 14:20:01 unRAID-mini kernel: blk_update_request: critical medium error, dev sdh, sector 12052793040 op 0x0:(READ) flags 0x0 phys_seg 116 prio class 0
Sep  1 14:20:01 unRAID-mini kernel: md: disk0 read error, sector=12052792976
Sep  1 14:20:01 unRAID-mini kernel: md: disk0 read error, sector=12052792984
Sep  1 14:20:01 unRAID-mini kernel: md: disk0 read error, sector=12052792992
Sep  1 14:20:01 unRAID-mini kernel: md: disk0 read error, sector=12052793000
Sep  1 14:20:01 unRAID-mini kernel: md: disk0 read error, sector=12052793008
Sep  1 14:20:01 unRAID-mini kernel: md: disk0 read error, sector=12052793016
**** edited for length as also in attached syslog ****
Sep  1 14:20:01 unRAID-mini kernel: md: disk0 read error, sector=12052793888
Sep  1 14:20:01 unRAID-mini kernel: md: disk0 read error, sector=12052793896
Sep  1 14:21:01 unRAID-mini sSMTP[20478]: Creating SSL connection to host
Sep  1 14:21:01 unRAID-mini sSMTP[20478]: SSL connection using ECDHE-RSA-AES256-GCM-SHA384
Sep  1 14:21:02 unRAID-mini sSMTP[20478]: Sent mail for ********** (221 2.0.0 csmtp7.tb.ukmail.iss.as9143.net cmsmtp closing connection) uid=0 username=root outbytes=797

unraid-mini-diagnostics-20210902-1536.zip unraid-mini-smart-20210902-1531.zip

trurl · September 2, 2021

Disable spindown on parity and run an extended SMART test.

JorgeB · September 2, 2021

They are logged as disk errors, though they can be intermittent, like mentioned extended SMART test is what you should do.

Rich · September 2, 2021

Ok, will do. Thank you.

Rich · September 6, 2021

The extended smart test came back as "Completed without Error", which surprised me.

The read errors are up to 2964 now. Does any of this indicate what the most likely cause could be?

Sep  5 02:09:48 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Sep  5 02:09:48 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Sep  5 02:09:48 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Sep  5 02:09:48 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Sep  5 02:09:48 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Sep  5 02:09:48 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Sep  5 02:09:48 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
Sep  5 02:09:48 unRAID-mini kernel: sd 2:0:5:0: [sdh] tag#6594 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=6s
Sep  5 02:09:48 unRAID-mini kernel: sd 2:0:5:0: [sdh] tag#6594 Sense Key : 0x3 [current] [descriptor] 
Sep  5 02:09:48 unRAID-mini kernel: sd 2:0:5:0: [sdh] tag#6594 ASC=0x11 ASCQ=0x0 
Sep  5 02:09:48 unRAID-mini kernel: sd 2:0:5:0: [sdh] tag#6594 CDB: opcode=0x88 88 00 00 00 00 05 1d 0a 23 38 00 00 04 00 00 00
Sep  5 02:09:48 unRAID-mini kernel: blk_update_request: critical medium error, dev sdh, sector 21962041088 op 0x0:(READ) flags 0x0 phys_seg 7 prio class 0
Sep  5 02:09:48 unRAID-mini kernel: md: disk0 read error, sector=21962041024
Sep  5 02:09:48 unRAID-mini kernel: md: disk0 read error, sector=21962041032
Sep  5 02:09:48 unRAID-mini kernel: md: disk0 read error, sector=21962041040
Sep  5 02:09:48 unRAID-mini kernel: md: disk0 read error, sector=21962041048
Sep  5 02:09:48 unRAID-mini kernel: md: disk0 read error, sector=21962041056
Sep  5 02:09:48 unRAID-mini kernel: md: disk0 read error, sector=21962041064
Sep  5 02:09:48 unRAID-mini kernel: md: disk0 read error, sector=21962041072

unraid-mini-smart-20210902-1740.zip

JorgeB · September 6, 2021

Since the SMART test was successful disk should be OK, replace/swap cables to see if there's any difference.

Rich · September 6, 2021

Will do. Thank you

Rich · September 8, 2021

So i moved the disk to a separate bay in the server with different data and power cables and 156 error occurred over night.

That says to me the disk is probably the issue, but as it passed the extended smart test and i'm not a pro on backplanes and expanders, should i consider anything else, or is the obvious indicator the one to go with here?

JorgeB · September 8, 2021

1 hour ago, Rich said:

That says to me the disk is probably the issue

I agree, problem might be with the interface, and a SMART test won't test that part.

Rich · September 8, 2021

Ok, i'll focus on verifying that it's the disk. Thanks for your help.

Parity1 errors, started during parity check

Recommended Posts

Rich

Link to comment

trurl

Link to comment

JorgeB

Link to comment

Rich

Link to comment

Rich

Link to comment

JorgeB

Link to comment

Rich

Link to comment

Rich

Link to comment

JorgeB

Link to comment

Rich

Link to comment

Join the conversation