air_marshall Posted March 26 Share Posted March 26 Hi Guys, Long story short, had some 'thermal management' issues in my 'comms room' that may have started to push some of my older drives closer to EOL. Drive sdj threw a couple of SMART errors so set about replacing that one first with a pre-cleared 4TB drive sdi that was pre-cleared. During rebuild of data drive sde threw 41 read errors which translated to 41 data errors with the rebuild. Last Parity check without errors was 07 Mar 24. See attached diagnostics. sdj still attached to machine and hasn't been touched since removing from the array. Data on that drive is largely expendable. Non-expendable is backed up elsewhere. I have another 4TB waiting to be pre-cleared that is available for either replacement of another drive or adding to the array. What are my next steps before I make this situation worse..... TIA tower-diagnostics-20240326-0923.zip Quote Link to comment
JorgeB Posted March 26 Share Posted March 26 I don't see any rebuild in the logs, when was it done? Quote Link to comment
air_marshall Posted March 26 Author Share Posted March 26 23 minutes ago, JorgeB said: I don't see any rebuild in the logs, when was it done? Started 22 Mar 1051 Finished 22 Mar 2254 No reboot or power cycle since! Data-Rebuild 2024-03-22, 22:53:15 (Friday) 4 TB 12 hr, 2 min, 49 sec 92,2 MB/s OK 41 Quote Link to comment
JorgeB Posted March 26 Share Posted March 26 Logs posted start on Mar 24 12:02:24, so we can't see what happened. Based on SMART sde appears to be failing, you can run an extended SMART test to confirm. Quote Link to comment
air_marshall Posted March 26 Author Share Posted March 26 Ok, extended sde SMART test is running. Will post results. Same for sdj, the disc I replaced in the array. Any ideas why there is such a gap in my logs? syslog-previous ends Mar 21 12:30:57. Any place else I can go looking for them? Log seems to be flooded with an nvidia error that I've not seen before! Quote Link to comment
JorgeB Posted March 26 Share Posted March 26 Logs rotate automatically, any spam will make them rotate much sooner. Quote Link to comment
air_marshall Posted March 26 Author Share Posted March 26 sde extended SMART attached GUI reports completed without error tower-smart-20240326-2039.zip Quote Link to comment
JorgeB Posted March 27 Share Posted March 27 SMART test passed so the disk is OK for now, keep monitoring, especially these two attributes: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 99 200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 22 If they keep increasing you will likely get more errors, and in that case I would recommend replacing it. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.