Rich Posted September 2, 2021 Share Posted September 2, 2021 Hi All, Since it's last reboot (100ish days ago) my server has been running without any problems (that i've noticed), until yesterday when I saw a few errors appear during it's monthly parity check. During the parity check, i noticed that parity drive 1 had some read errors. The syslog showed the below, which i assume is referring to the disk errors. This has since occurred multiple times which has increased the error count on the main unRAID page to the current total of 1618 for parity 1. I have THIS case, which i have been using since February this year, without any issues. Appreciate that there are likely several potential causes, e.g. cables, backplane or hard drive, however before i start taking things apart to test, i wanted to post here and see if the error indicates one cause to be more likely than the others and if there could be any likely causes that i have not listed above. Any help is greatly appreciated. Thanks. Rich Sep 1 14:20:01 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Sep 1 14:20:01 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Sep 1 14:20:01 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Sep 1 14:20:01 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Sep 1 14:20:01 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Sep 1 14:20:01 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Sep 1 14:20:01 unRAID-mini kernel: sd 2:0:5:0: [sdh] tag#9374 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=11s Sep 1 14:20:01 unRAID-mini kernel: sd 2:0:5:0: [sdh] tag#9374 Sense Key : 0x3 [current] [descriptor] Sep 1 14:20:01 unRAID-mini kernel: sd 2:0:5:0: [sdh] tag#9374 ASC=0x11 ASCQ=0x0 Sep 1 14:20:01 unRAID-mini kernel: sd 2:0:5:0: [sdh] tag#9374 CDB: opcode=0x88 88 00 00 00 00 02 ce 67 06 70 00 00 04 00 00 00 Sep 1 14:20:01 unRAID-mini kernel: blk_update_request: critical medium error, dev sdh, sector 12052793040 op 0x0:(READ) flags 0x0 phys_seg 116 prio class 0 Sep 1 14:20:01 unRAID-mini kernel: md: disk0 read error, sector=12052792976 Sep 1 14:20:01 unRAID-mini kernel: md: disk0 read error, sector=12052792984 Sep 1 14:20:01 unRAID-mini kernel: md: disk0 read error, sector=12052792992 Sep 1 14:20:01 unRAID-mini kernel: md: disk0 read error, sector=12052793000 Sep 1 14:20:01 unRAID-mini kernel: md: disk0 read error, sector=12052793008 Sep 1 14:20:01 unRAID-mini kernel: md: disk0 read error, sector=12052793016 **** edited for length as also in attached syslog **** Sep 1 14:20:01 unRAID-mini kernel: md: disk0 read error, sector=12052793888 Sep 1 14:20:01 unRAID-mini kernel: md: disk0 read error, sector=12052793896 Sep 1 14:21:01 unRAID-mini sSMTP[20478]: Creating SSL connection to host Sep 1 14:21:01 unRAID-mini sSMTP[20478]: SSL connection using ECDHE-RSA-AES256-GCM-SHA384 Sep 1 14:21:02 unRAID-mini sSMTP[20478]: Sent mail for ********** (221 2.0.0 csmtp7.tb.ukmail.iss.as9143.net cmsmtp closing connection) uid=0 username=root outbytes=797 unraid-mini-diagnostics-20210902-1536.zip unraid-mini-smart-20210902-1531.zip Quote Link to comment
trurl Posted September 2, 2021 Share Posted September 2, 2021 Disable spindown on parity and run an extended SMART test. Quote Link to comment
JorgeB Posted September 2, 2021 Share Posted September 2, 2021 They are logged as disk errors, though they can be intermittent, like mentioned extended SMART test is what you should do. Quote Link to comment
Rich Posted September 2, 2021 Author Share Posted September 2, 2021 Ok, will do. Thank you. Quote Link to comment
Rich Posted September 6, 2021 Author Share Posted September 6, 2021 The extended smart test came back as "Completed without Error", which surprised me. The read errors are up to 2964 now. Does any of this indicate what the most likely cause could be? Sep 5 02:09:48 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Sep 5 02:09:48 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Sep 5 02:09:48 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Sep 5 02:09:48 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Sep 5 02:09:48 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Sep 5 02:09:48 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Sep 5 02:09:48 unRAID-mini kernel: mpt3sas_cm0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) Sep 5 02:09:48 unRAID-mini kernel: sd 2:0:5:0: [sdh] tag#6594 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 cmd_age=6s Sep 5 02:09:48 unRAID-mini kernel: sd 2:0:5:0: [sdh] tag#6594 Sense Key : 0x3 [current] [descriptor] Sep 5 02:09:48 unRAID-mini kernel: sd 2:0:5:0: [sdh] tag#6594 ASC=0x11 ASCQ=0x0 Sep 5 02:09:48 unRAID-mini kernel: sd 2:0:5:0: [sdh] tag#6594 CDB: opcode=0x88 88 00 00 00 00 05 1d 0a 23 38 00 00 04 00 00 00 Sep 5 02:09:48 unRAID-mini kernel: blk_update_request: critical medium error, dev sdh, sector 21962041088 op 0x0:(READ) flags 0x0 phys_seg 7 prio class 0 Sep 5 02:09:48 unRAID-mini kernel: md: disk0 read error, sector=21962041024 Sep 5 02:09:48 unRAID-mini kernel: md: disk0 read error, sector=21962041032 Sep 5 02:09:48 unRAID-mini kernel: md: disk0 read error, sector=21962041040 Sep 5 02:09:48 unRAID-mini kernel: md: disk0 read error, sector=21962041048 Sep 5 02:09:48 unRAID-mini kernel: md: disk0 read error, sector=21962041056 Sep 5 02:09:48 unRAID-mini kernel: md: disk0 read error, sector=21962041064 Sep 5 02:09:48 unRAID-mini kernel: md: disk0 read error, sector=21962041072 unraid-mini-smart-20210902-1740.zip Quote Link to comment
JorgeB Posted September 6, 2021 Share Posted September 6, 2021 Since the SMART test was successful disk should be OK, replace/swap cables to see if there's any difference. Quote Link to comment
Rich Posted September 6, 2021 Author Share Posted September 6, 2021 Will do. Thank you Quote Link to comment
Rich Posted September 8, 2021 Author Share Posted September 8, 2021 So i moved the disk to a separate bay in the server with different data and power cables and 156 error occurred over night. That says to me the disk is probably the issue, but as it passed the extended smart test and i'm not a pro on backplanes and expanders, should i consider anything else, or is the obvious indicator the one to go with here? Quote Link to comment
JorgeB Posted September 8, 2021 Share Posted September 8, 2021 1 hour ago, Rich said: That says to me the disk is probably the issue I agree, problem might be with the interface, and a SMART test won't test that part. Quote Link to comment
Rich Posted September 8, 2021 Author Share Posted September 8, 2021 Ok, i'll focus on verifying that it's the disk. Thanks for your help. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.