Kurtains Posted August 1, 2023 Share Posted August 1, 2023 (edited) Hi all, I haven't particularly had this issue come up before where parity check has stalled after ~12 hours into a typical ~22hr parity check on the first of each month. I got two errors at the same time - Warning (Array has 1 disk with read errors), and Alert (Disk 12 in error state disk dsbl). The disk shows in unassigned devices and with just a format button. Nothing in the syslog spoke to me as to the root issue and I wanted to check here before proceeding with an attempt to restart array without the drive selected, and then start with it selected to perform rebuild/sync. Currently the page is hanging when I select to proceed with canceling said stalled parity check, hoping to not do a force shutdown/restart. Diagnostics attached but also SMART report for disk 12 from a couple of weeks ago. Thanks Edit: I had to shutdown the machine as the 'main' page was unresponsive and I couldn't cancel the stalled parity check via parity.check plugin. Disk 12 was rather noisy when booting back up, smart test completed: read failure, but still says passed. attaching new smart report. citadel-diagnostics-20230802-0630.zip ST8000NE001-2M7101_WSDA5J92-20230717-1323.txt ST8000NE001-2M7101_WSDA5J92-20230802-0906.txt Edited August 1, 2023 by Kurtains Quote Link to comment
JorgeB Posted August 2, 2023 Share Posted August 2, 2023 Looks more like a power/connection issue, but first error is logged as a disk problem, so good idea to run an extended SMART test on disk12. Quote Link to comment
Kurtains Posted August 2, 2023 Author Share Posted August 2, 2023 Why do you think it's a power/connection issue? I recently checked connections after doing some dusting inside the rack and case, plus it seems weird that it would be just one drive that's connected up to the backplane and not multiple/surrounding drives to also have issues right? The first (oldest) of those two attached smart reports is an extended one, from only a little time ago - will an extended one even complete if I can't currently run a short test as it finishes moments after starting "completed: read failure"? Quote Link to comment
JorgeB Posted August 2, 2023 Share Posted August 2, 2023 1 hour ago, Kurtains said: Why do you think it's a power/connection issue? The disk dropped offline and this is most often power/connection, but like mentioned 1 hour ago, JorgeB said: first error is logged as a disk problem, so good idea to run an extended SMART test on disk12. Quote Link to comment
Kurtains Posted August 2, 2023 Author Share Posted August 2, 2023 The extended test completed with read failure, report which says passed is attached. ST8000NE001-2M7101_WSDA5J92-20230803-0311.txt Quote Link to comment
Solution itimpi Posted August 2, 2023 Solution Share Posted August 2, 2023 If the Extended SMART test fails then the dtive should be replaced. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.