Netbug Posted November 18, 2021 Share Posted November 18, 2021 UnRaid Version 6.9.2 I noticed that the system was responding very sluggishly for some reason so I had a quick look, and disk 8 was showing offline. I stopped the array, removed the drive from the array, started array, stopped array, re-assigned, started, and waited for rebuild. Everything seemed fine. Then the next day, the same thing happened. This is where I screwed up. I completely removed the drive and attempted to pre-clear it (data wasn't super important). Left it for about 30 hours, and came back to a message of "Error encountered, please verify the log". I now have a replacement drive, which I will be installing now, but I've got a few of these drives that seem to fail and i'm not knowledgeable enough to understand if they are actually dead. I've tried reading through the "Understanding SMART Reports" article but I'm just not smart enough to get it. I'm attaching logs here. My questions are: 1. Is there any way to know, from the logs and information, what happened? 2. Is there a way that I can purchase something (like an external SATA connector) for my Windows PC and use the windows machine to check these supposedly failed drives? Thanks. tower-diagnostics-20211118-0549.zip tower-syslog-20211118-1048.zip Quote Link to comment
ChatNoir Posted November 18, 2021 Share Posted November 18, 2021 drive Z5029NBR has only a very old SMART test but the attributes do not look good (in particular #197 & 198 but also #5): Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-- 086 083 006 - 202923257 3 Spin_Up_Time PO---- 095 094 000 - 0 4 Start_Stop_Count -O--CK 100 100 020 - 179 5 Reallocated_Sector_Ct PO--CK 082 082 010 - 22944 7 Seek_Error_Rate POSR-- 075 060 030 - 36487737 9 Power_On_Hours -O--CK 044 044 000 - 49544 10 Spin_Retry_Count PO--C- 100 100 097 - 0 12 Power_Cycle_Count -O--CK 100 100 020 - 177 183 Runtime_Bad_Block -O--CK 100 100 000 - 0 184 End-to-End_Error -O--CK 100 100 099 - 0 187 Reported_Uncorrect -O--CK 001 001 000 - 1152 188 Command_Timeout -O--CK 097 092 000 - 14 14 31 189 High_Fly_Writes -O-RCK 088 088 000 - 12 190 Airflow_Temperature_Cel -O---K 071 064 045 - 29 (Min/Max 25/36) 191 G-Sense_Error_Rate -O--CK 100 100 000 - 0 192 Power-Off_Retract_Count -O--CK 100 100 000 - 30 193 Load_Cycle_Count -O--CK 001 001 000 - 221440 194 Temperature_Celsius -O---K 029 040 000 - 29 (0 13 0 0 0) 197 Current_Pending_Sector -O--C- 099 089 000 - 280 198 Offline_Uncorrectable ----C- 099 089 000 - 280 199 UDMA_CRC_Error_Count -OSRCK 200 200 000 - 0 240 Head_Flying_Hours ------ 100 253 000 - 6600h+36m+56.005s 241 Total_LBAs_Written ------ 100 253 000 - 121723793480 242 Total_LBAs_Read ------ 100 253 000 - 1372755437944 You could try to run an extended SMART test, but don't get your hopes up. Regarding what happened, not sure myself, maybe another user with more knowledge can chime in. I'd guess that it is just an old drive ? (+49500 of power ON hours) 1 Quote Link to comment
JorgeB Posted November 18, 2021 Share Posted November 18, 2021 17 minutes ago, Netbug said: 1. Is there any way to know, from the logs and information, what happened? Disk has pending sectors, it needs to be replaced, on other disks you can run an extended SMART test to confirm if they are OK or not. 1 Quote Link to comment
Netbug Posted November 18, 2021 Author Share Posted November 18, 2021 Thank you for the replies. I still don't understand quite how to interpret those results. I'll have to dig in to what Current_Pending_Sector and Offline_Uncorrectable mean and what thresholds are acceptable. Any recommendations for a Windows utility to test drives (I know it's slightly off-topic)? Quote Link to comment
JorgeB Posted November 18, 2021 Share Posted November 18, 2021 1 minute ago, Netbug said: Any recommendations for a Windows utility to test drives (I know it's slightly off-topic)? If the disks are outside the server and you don't have enough ports there you can install smartmontools in Windows and run a SMART test. 1 Quote Link to comment
ChatNoir Posted November 18, 2021 Share Posted November 18, 2021 4 minutes ago, Netbug said: I'll have to dig in to what Current_Pending_Sector and Offline_Uncorrectable mean and what thresholds are acceptable. You can start there : https://en.wikipedia.org/wiki/S.M.A.R.T. For #198 the only acceptable should be 0; For #197, it should not stay above 0 for too long. The should go from Pending to Reallocated (#5), but there is only a limited amount of reserve sectors the drive can use. 1 Quote Link to comment
Squid Posted November 18, 2021 Share Posted November 18, 2021 1 hour ago, Netbug said: I'll have to dig in to what Current_Pending_Sector and Offline_Uncorrectable mean and what thresholds are acceptable. The thresholds in SMART is when the manufacturer outright states that the drive is failing. However, they tend to be very skewed towards the manufacturer's best interests on some attributes. In particular attribute 5 the value vs the threshold shows that the drive is no where near failing. However 23000 reallocated sectors already (and more coming) shows that the drive is basically toast. Many users think that a single reallocated sector is grounds to replace a drive. I can deal with ~100 before I start to get worried. Over a hundred and I'll order a new drive online. At 23000, I'd be going to the nearest brick and mortar store. Do you have notifications set up? The OS would have warned you about this presumably a long time ago. 1 Quote Link to comment
Netbug Posted November 18, 2021 Author Share Posted November 18, 2021 4 hours ago, Squid said: Do you have notifications set up? The OS would have warned you about this presumably a long time ago. I don't, or I have them filtered due to spam. Is there a wiki article on configuring them? (I know, I'm bad at this) Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.