jsmontague Posted March 10, 2020 Share Posted March 10, 2020 Just got my first unRAID build stood up using SC846 case and internals of my old "cold" backup storage server and disks. All my main data is on a Window Storage systems array with no redundancy so although I want to get this guy done right, I also am a little freaked out in the sheer amount of time it takes to get array up etc i'm running with no backup should a drive fail in my current server. That being said i've got my old 4TB's and a shucked 6TB installed and pool built and I just completed a parity check. Both times it finds a ton of errors but SMART test is passing on disks. Can anyone help a noob out here and tell me which disks need to go? Last thing I want is to have to remove/replace a drive or even rebuild the pool in the middle of migrating data etc. I have 2 6TB on way today and once I kill my current server plan on adding one of its 6TB as 2nd parity for N+2 drive protection. Diag attached tower-diagnostics-20200310-0603.zip Quote Link to comment
JorgeB Posted March 10, 2020 Share Posted March 10, 2020 Disk1 appears to be failing, you can run an extended SMART test. Quote Link to comment
itimpi Posted March 10, 2020 Share Posted March 10, 2020 Disk2 also started showing some read errors so should probably have the extended SMART test run on it as well. 1 Quote Link to comment
JorgeB Posted March 10, 2020 Share Posted March 10, 2020 3 minutes ago, itimpi said: Disk2 also started showing some read errors Yep, missed that one. Quote Link to comment
jsmontague Posted March 10, 2020 Author Share Posted March 10, 2020 I just kicked off SMART tests on Disk1 and Disk2 will post back. Thanks!! Quote Link to comment
jsmontague Posted March 10, 2020 Author Share Posted March 10, 2020 alexandria-diagnostics-20200310-1755.zip Not sure if the diagnostics now hold the extended SMART test data or not? I can't see anything different when I pull each drives report. Both show "passed" in the dashboard. Quote Link to comment
JorgeB Posted March 11, 2020 Share Posted March 11, 2020 Both SMART tests completed successfully, when there are no pending sectors these errors can sometimes be intermittent, I suggest replacing/swapping cables on both disks (power and SATA cables) then try again, post new diags if there are more errors. Quote Link to comment
itimpi Posted March 11, 2020 Share Posted March 11, 2020 Disk2 still does not look that good as it shows 156 pending sectors which will cause read error type problems if they do not end up being reallocated. Quote Link to comment
JorgeB Posted March 11, 2020 Share Posted March 11, 2020 4 minutes ago, itimpi said: as it shows 156 pending sectors Yes, but since the SMART test passed those are "false positives", still pretty sure the read errors on both disks are disk related and they will likely fail again soon, still good to rule out connection issues since on rare cases they are logged as media/UNC errors. Quote Link to comment
jsmontague Posted March 11, 2020 Author Share Posted March 11, 2020 These are directly connected disks in a SC846 chassis with a stock backplane so no cables in the system outside the SAS connectors to the LSI card. They have previously lived in a windows based system with direct SATA cables to MOBO are those leftover SMART numbers from before migration? I've read up on the guide to understanding SMART so that helped me a little bit, any other guides/info to read up on you recommend? I am adding in a second parity disk so for now I think my plan will be to run them and keep a disk on standby should either give up the ghost. Quote Link to comment
JorgeB Posted March 11, 2020 Share Posted March 11, 2020 Besides the normal pending/reallocated sector attributes, on WD drivers there are couple more that should be monitored: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 51 200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 26 Both should be 0, or close to that on a healthy drive, higher numbers are usually bad news (especially if they keep climbing) and the disk will likely return read errors sooner or later, but there are exceptions, or disks that give a few errors then work fine for some time. Quote Link to comment
JorgeB Posted March 11, 2020 Share Posted March 11, 2020 15 minutes ago, jsmontague said: These are directly connected disks in a SC846 chassis with a stock backplane so no cables in the system outside the SAS connectors to the LSI card. Forgot to say, you can still swap backplane slots to rule that out, but like mentioned I don't think that is the problem. Quote Link to comment
jsmontague Posted March 11, 2020 Author Share Posted March 11, 2020 which field should be 0 or close to it? Quote Link to comment
jsmontague Posted March 11, 2020 Author Share Posted March 11, 2020 is that same for all manufacturers? I'm currently adding in an old Seagate drive that is showing the following 😬😬 # ATTRIBUTE NAME FLAG VALUE WORST THRESHOLD TYPE UPDATED FAILED RAW VALUE 1 Raw read error rate 0x000f 114 099 006 Pre-fail Always Never 69344592 Quote Link to comment
jsmontague Posted March 11, 2020 Author Share Posted March 11, 2020 Reading the below link it says "PLEASE completely ignore the RAW_VALUE number!" for Raw_Read_Error_Rate. https://wiki.unraid.net/Understanding_SMART_Reports Quote Link to comment
JorgeB Posted March 11, 2020 Share Posted March 11, 2020 16 minutes ago, jsmontague said: PLEASE completely ignore the RAW_VALUE number!" for Raw_Read_Error_Rate. That's for Seagate drives: https://forums.unraid.net/topic/86337-are-my-smart-reports-bad/?do=findComment&comment=800888 Quote Link to comment
jsmontague Posted March 11, 2020 Author Share Posted March 11, 2020 21 minutes ago, johnnie.black said: That's for Seagate drives: https://forums.unraid.net/topic/86337-are-my-smart-reports-bad/?do=findComment&comment=800888 Im not following. the link I sent says not to read that variable for any manufacturer except seagate. all mine above are WD so that variable isn't used instead should be reading the "value" column. Is the WIKI wrong or shouldn't be followed? Quote Link to comment
JorgeB Posted March 11, 2020 Share Posted March 11, 2020 The link explains ou to read the actual errors on Seagate drives, just looking at the total RAW value is pointless for those, not for WD drives, the RAW value is the actual number of errors, so 0 = good, anything above 0 not so good, though low values can be OK. Quote Link to comment
jsmontague Posted March 11, 2020 Author Share Posted March 11, 2020 Couple pronouns here maybe are causing me to not understand you so maybe i'll start over to help. https://wiki.unraid.net/Understanding_SMART_Reports explains how to read SMART table. It says to ignore WD hard drives RAW value on the "Raw_Read_Error_Rate" field. So if the value below "51" for example is ignored as this is a WD drive, what value do I keep my eye on for incrementing to help point to the drive failing? SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 51 Quote Link to comment
trurl Posted March 11, 2020 Share Posted March 11, 2020 3 minutes ago, jsmontague said: what value do I keep my eye on Unraid monitors SMART attributes for you. You should setup Notifications to alert you immediately by email or other agent as soon as a problem is detected. In Settings - Disk Settings, you can specify which SMART attributes are monitored for all disks. You can override that setting for specific disks by clicking on the disk to get to its settings page. Unraid will Notify when any monitored SMART attribute changes. Quote Link to comment
JorgeB Posted March 11, 2020 Share Posted March 11, 2020 8 minutes ago, jsmontague said: It says to ignore WD hard drives RAW value on the "Raw_Read_Error_Rate" field. That's incorrect and should be updated. 9 minutes ago, jsmontague said: So if the value below "51" for example is ignored as this is a WD drive, It shouldn't be ignored, that's the value you need to keep an eye on. Quote Link to comment
jsmontague Posted March 11, 2020 Author Share Posted March 11, 2020 Thanks thats what I thought you were saying but wanted to be 100% certain! I'll get notifications going as well! Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.