PsionStorm Posted September 15, 2023 Share Posted September 15, 2023 About three weeks ago, I added some new drives to my system, and upgraded my parity drive to a 14TB drive. Since then everything has been functioning without error, but this morning my monthly parity check fired up. I'm three hours in, and I've got a scary number of sync errors that I've never seen before. I'm hoping this is just a result of missing a step with the drive additions. At this point, should I just cancel the scheduled parity check and force a correcting parity check? Or should I see this one through first? excelsior-diagnostics-20230915-0259.zip Quote Link to comment
JorgeB Posted September 15, 2023 Share Posted September 15, 2023 1 hour ago, PsionStorm said: and upgraded my parity drive to a 14TB drive Did you do a parity swap at this time? Quote Link to comment
itimpi Posted September 15, 2023 Share Posted September 15, 2023 Has the parity check got beyond the size of the biggest data drive? Did you use the parity swap procedure to upgrade the parity drive? If so it seems that the space beyond the largest drive is not always correctly zeroed on the new parity drive and you need to run a correcting check once to fix this - after that future checks should be error free. However I do not think that is your problem here It looks as if there may be some other problem that needs looking at first as in your syslog there are continual Sep 12 14:46:15 Excelsior kernel: sd 1:0:2:0: attempting task abort!scmd(0x000000001e0cffe4), outstanding for 30528 ms & timeout 30000 ms Sep 12 14:46:15 Excelsior kernel: sd 1:0:2:0: [sdi] tag#2226 CDB: opcode=0x88 88 00 00 00 00 00 9c b9 b4 40 00 00 00 08 00 00 Sep 12 14:46:15 Excelsior kernel: scsi target1:0:2: handle(0x000b), sas_address(0x5000cca23c0d2c39), phy(2) Sep 12 14:46:15 Excelsior kernel: scsi target1:0:2: enclosure logical id(0x500605b0098b8100), slot(3) Sep 12 14:46:15 Excelsior kernel: sd 1:0:2:0: task abort: SUCCESS scmd(0x000000001e0cffe4) type messages in the syslog (sdi appears to be disk5). These seem to have started on Sep 8 - did you do anything to the system then? Is there any indication in the GUI of possible problems with disk5? I would suggest you Cancel the current check as no point proceeding if there are underlying hardware issues. Do NOT at this point attempt to run a correcting check as if you have a hardware issue you are more likely to end up corrupting parity Carefully check all connections (power and SATA) to disk5). Perhaps when changing drives you slightly disturbed an existing connection or did not quite perfectly seat one. Not sure if at that point you should retry the non-correcting check or do something else such as an extended test on disk5 You may want to wait to see if anyone else (in particular @JorgeB has any other suggestions. Quote Link to comment
PsionStorm Posted September 15, 2023 Author Share Posted September 15, 2023 (edited) 1 hour ago, JorgeB said: Did you do a parity swap at this time? I thought I did, but I followed a different process. I added the new drive as a 2nd parity, then pulled the old one and set up a new config. I saw that recommended in a few places and didn't see the article you linked. Looks like that may have been my mistake? 1 hour ago, itimpi said: Has the parity check got beyond the size of the biggest data drive? Did you use the parity swap procedure to upgrade the parity drive? If so it seems that the space beyond the largest drive is not always correctly zeroed on the new parity drive and you need to run a correcting check once to fix this - after that future checks should be error free. However I do not think that is your problem here It looks as if there may be some other problem that needs looking at first as in your syslog there are continual Sep 12 14:46:15 Excelsior kernel: sd 1:0:2:0: attempting task abort!scmd(0x000000001e0cffe4), outstanding for 30528 ms & timeout 30000 ms Sep 12 14:46:15 Excelsior kernel: sd 1:0:2:0: [sdi] tag#2226 CDB: opcode=0x88 88 00 00 00 00 00 9c b9 b4 40 00 00 00 08 00 00 Sep 12 14:46:15 Excelsior kernel: scsi target1:0:2: handle(0x000b), sas_address(0x5000cca23c0d2c39), phy(2) Sep 12 14:46:15 Excelsior kernel: scsi target1:0:2: enclosure logical id(0x500605b0098b8100), slot(3) Sep 12 14:46:15 Excelsior kernel: sd 1:0:2:0: task abort: SUCCESS scmd(0x000000001e0cffe4) type messages in the syslog (sdi appears to be disk5). These seem to have started on Sep 8 - did you do anything to the system then? Is there any indication in the GUI of possible problems with disk5? I would suggest you Cancel the current check as no point proceeding if there are underlying hardware issues. Do NOT at this point attempt to run a correcting check as if you have a hardware issue you are more likely to end up corrupting parity Carefully check all connections (power and SATA) to disk5). Perhaps when changing drives you slightly disturbed an existing connection or did not quite perfectly seat one. Not sure if at that point you should retry the non-correcting check or do something else such as an extended test on disk5 You may want to wait to see if anyone else (in particular @JorgeB has any other suggestions. Regarding those errors, I have done extensive troubleshooting trying to resolve them and have been unable to. They occur on all of my SAS drives, but only my SAS drives. See thread here: Any advice would be appreciated. Since this post I've replaced the SAS card and have tried three different sets of cables with no different result. Edited September 15, 2023 by PsionStorm Quote Link to comment
Solution JorgeB Posted September 15, 2023 Solution Share Posted September 15, 2023 17 minutes ago, PsionStorm said: then pulled the old one and set up a new config And did you leave it as parity2 or assigned it to parity1? They are not interchangeable. Quote Link to comment
PsionStorm Posted September 15, 2023 Author Share Posted September 15, 2023 4 minutes ago, JorgeB said: And did you leave it as parity2 or assigned it to parity1? They are not interchangeable. I moved it to parity1. Quote Link to comment
JorgeB Posted September 15, 2023 Share Posted September 15, 2023 Then the sync errors are normal, just let it sync. p.s. it will probably be much faster doing a new sync vs a correcting check. Quote Link to comment
PsionStorm Posted September 15, 2023 Author Share Posted September 15, 2023 I believe the current sync is non-correcting - it was my scheduled monthly parity sync and I *think* I turned off correcting. Does it tell you in the logs? Quote Link to comment
JorgeB Posted September 15, 2023 Share Posted September 15, 2023 4 minutes ago, PsionStorm said: I believe the current sync is non-correcting It is. Quote Link to comment
PsionStorm Posted September 15, 2023 Author Share Posted September 15, 2023 So, I'll just let this one complete then. Anything you recommend doing after it's completed? Quote Link to comment
JorgeB Posted September 15, 2023 Share Posted September 15, 2023 Not much point in letting that one complete, I would recommend doing a new sync since it will be faster. Quote Link to comment
PsionStorm Posted September 15, 2023 Author Share Posted September 15, 2023 Sounds good. I'll start that when I get home. Thanks! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.