AshleyAitken Posted February 16, 2023 Share Posted February 16, 2023 Hi All, I'm new to Unraid and SMART (although I have setup my Unraid array and generally know about SMART). I'm testing now and learning about both of these. I have one disk in the array that's reported 9000+ errors (particularly when setting up the array). I run an extended self-test and it stops at 10% with "Completed: read failure" When I look at the SMART report though it suggested the disk "Passed." How can that be? Is the disk usable? Unraid is not reporting failure or need to replace etc. The SMART report is attached. Thanks for any suggestions. Cheers, Ashley. ST3000DM001-1ER166_Z501RTHG-20230216-0922.txt Quote Link to comment
trurl Posted February 16, 2023 Share Posted February 16, 2023 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 18441 886976 # 2 Short offline Completed: read failure 90% 18441 886976 29 minutes ago, AshleyAitken said: Is the disk usable? no Quote Link to comment
trurl Posted February 16, 2023 Share Posted February 16, 2023 Also, you should be seeing SMART warnings ( 👎 ) on the Dashboard page for that disk. And if you click on the disk to get to its Attributes, these would all be highlighted. ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 5 Reallocated_Sector_Ct PO--CK 097 097 010 - 3512 187 Reported_Uncorrect -O--CK 001 001 000 - 147 197 Current_Pending_Sector -O--C- 001 001 000 - 57720 198 Offline_Uncorrectable ----C- 001 001 000 - 57720 Do you have Notifications setup to alert you immediately by email or other agent as soon as a problem is detected? Quote Link to comment
trurl Posted February 16, 2023 Share Posted February 16, 2023 Post diagnostics if you want us to take a look at your hardware and configuration. Quote Link to comment
AshleyAitken Posted February 16, 2023 Author Share Posted February 16, 2023 (edited) Thank you @trurl for your replies. TBH, I was overwhelmed by the SMART data and focussed on the "Passed" status. Is this disk usable? No. It's working as a part of the Unraid array (AFAIK). I can copy files to and read them from the array, but of course that could be the other drive doing the work? I would have thought Unraid would have told me if it was having trouble writing or reading to a disk in the array, and I should replace the disk? I would hope I would get more than a "warning." And now it does seem to be, although I don't believe it showed this "error" status before I had run the SMART Self-Tests. Even when I was setting up the array and there were 9000+ *warnings* the UI dashboard didn't show anything. Notifications do come up in Unraid and I do get emails but again they were listed as "warnings" in the notifications and emails (AFAIK), whereas in the Main page > Array Devices they are listed as errors. The disk does report these attributes: and in more detail: Those "Pre-fail" look ominous 😞 and probably mean I should replace the disk? But still I am confused why SMART reports "PASSED." Surely it should be saying something like "going to fail" or "replace"? Again, I am only learning and there's no real data on the array as yet. Thanks for all comments and suggestions. Edited February 16, 2023 by AshleyAitken Quote Link to comment
trurl Posted February 16, 2023 Share Posted February 16, 2023 2 hours ago, AshleyAitken said: that could be the other drive doing the work? If a disk isn't disabled, it is the only one doing the work for reads. If a disk is disabled, it isn't used at all and the rest of the array is doing the work for it. 2 hours ago, AshleyAitken said: before I had run the SMART Self-Tests Until you ran the self-tests, some of the problems might have been on parts of the disk that hadn't been accessed yet. Quote Link to comment
trurl Posted February 16, 2023 Share Posted February 16, 2023 2 hours ago, AshleyAitken said: when I was setting up the array and there were 9000+ *warnings* If you haven't rebooted diagnostics might tell us more about that. Quote Link to comment
AshleyAitken Posted February 16, 2023 Author Share Posted February 16, 2023 Thank you again. I think I did reboot (probably more than once). Disk 2 now also showed an error in the main dashboard. But strangely no errors show on Main page. The attributes for each disk show as: Why do they show "Never" for Failed column (except for airflow temperature)? Quote Link to comment
trurl Posted February 16, 2023 Share Posted February 16, 2023 UDMA CRC errors are recorded by the drive firmware when it receives inconsistent data based on checksum. These are almost always connection problems. Usually the data is resent so no harm, but you do need to fix something if they increase frequently. And often, a connection problem will not result in CRC error because the drive never receives any data to checksum. More in my next post on things you need to be concerned with and some things you can do with the webUI. Quote Link to comment
itimpi Posted February 16, 2023 Share Posted February 16, 2023 10 hours ago, AshleyAitken said: was overwhelmed by the SMART data and focussed on the "Passed" status. In many cases the overall Passed status is meaningless. It only changes if one of the attributes has a “Failing Now” status. The one that is the best indication of drive health is the Extended SMART test. If this cannot complete without error then you should be replacing the drive. 1 Quote Link to comment
trurl Posted February 16, 2023 Share Posted February 16, 2023 In general, you can always Acknowledge a SMART warning on the Dashboard page by clicking on it ( 👎) and it will warn again if it increases. Some are more important than others. An occasional CRC I usually just acknowledge, and maybe just check and reseat the connection next time I need to open the case. More frequent CRC needs to be taken care of. The screenshot you posted earlier with the checkboxes shows which Attributes get monitored. You can set these for all disks in Disk Settings, and override those settings for an individual disk in its settings. A disk has some sectors reserved to replace bad sectors. That is what reallocation is about. A few reallocated is usually OK as long as it isn't increasing. Pending sectors are sectors that will be reallocated when they are written again. These are a little more worrisome because it means the data at that sector can't be reliably read. You can insure these get written by rebuilding the disk. You can add to the list of SMART attributes for monitoring in Disk Settings or in the settings for an individual disk. Some disk manufacturers use the attributes a little differently. It is recommended that you add attributes 1 and 200 for WD disks. 1 Quote Link to comment
AshleyAitken Posted February 20, 2023 Author Share Posted February 20, 2023 I must say I am impressed with the look and feel of the Unraid web UI but somewhat disappointed with the vagueness (at least it seems so to me) and confusion with regards to disk / array status. Here's an other example: Disk 2, which has been the better disk so far, has now had 22 errors/warnings and shows this in the dashboard. Why is it "disabled" when SMART is showing healthy (after errors and warning)? As an end user I would really like to know Unraid best understanding is to the overall status of each disk, e.g. healthy, failing, replace asap, and failed, and similarly for the array, e.g. healthy, needs repairing, repairing. Nothing more... (unless I go to an advanced page etc). IMHO, having to understand and research different disk errors and try to work out what the UI is telling me (without popups explaining different values and information etc) is not good. Quote Link to comment
trurl Posted February 20, 2023 Share Posted February 20, 2023 It might be useful to think of warnings as more important than errors. Error typically means a specific thing has failed in a specific way. How important that is depends on the details. Warnings means there is something that deserves your attention. How to deal with that depends on the details. 1 hour ago, AshleyAitken said: Why is it "disabled" when SMART is showing healthy Unraid disables a disk when a write to it fails for any reason. Often this isn't a problem with the disk, but a problem communicating with the disk. That failed write updates parity, so it can be recovered by rebuilding, but the physical disk isn't used again until rebuilt since it is now out-of-sync with the array. After disabling, the disk is emulated from parity, When reading an emulated disk, the data comes from the parity calculation by reading all other disks. When writing an emulated disk, parity is updated as if the disk had been written. The initial failed write, and any subsequent writes to the disabled/emulated disk, can be recovered by rebuilding. https://wiki.unraid.net/Manual/Overview#Parity-Protected_Array 1 hour ago, AshleyAitken said: having to understand and research Unfortunately, there are a lot of details to consider, and trying to put them all in the webUI would make for an incomprehensible user interface. And, even with all the information, it might be difficult for an inexperienced user to choose the best approach for resolving a problem. So, please ask on the forum if you are unsure how to proceed. There is a lot more information in the wiki, which you can access from the Documentation link at the bottom of the forum, or from your Unraid webUI by clicking "manual" at lower right. 1 hour ago, AshleyAitken said: Why is it "disabled" The best way to answer that question is by examining the diagnostics, taken before reboot. The diagnostics contains the current syslog, which is in RAM like the rest of the OS. Diagnostics after reboot can tell us a lot about how things are, but they may not be able to tell us much about how they got that way. Diagnostics also includes SMART reports for all connected disks and other useful information about your hardware and configuration. Whether or not you have rebooted since disk2 became disabled: Attach new diagnostics to your NEXT post in this thread. 1 Quote Link to comment
trurl Posted February 20, 2023 Share Posted February 20, 2023 You have to rebuild disk2, but it would be useful to see your diagnostics first. 1 Quote Link to comment
Decto Posted February 20, 2023 Share Posted February 20, 2023 On 2/16/2023 at 1:36 AM, AshleyAitken said: I have one disk in the array that's reported 9000+ errors (particularly when setting up the array). Hi, on a side note that the specific model of 3TB drive is somewhat notorious for a very high failure rate, around 32% for Backblaze. Given the age of the drives you may want to consider replacing or ensuring you have an independant backup of anything irreplaceable. Wiki ST3000DM001 ExtremeTech 1 Quote Link to comment
AshleyAitken Posted February 24, 2023 Author Share Posted February 24, 2023 On 2/21/2023 at 1:27 AM, trurl said: You have to rebuild disk2, but it would be useful to see your diagnostics first. Thank you, so disabled means the disk has "failed" (or got out of whack somehow) and the array is running without it. Interesting, I would have thought that would have been a more significant event and made headlines news on the Dashboard and Main (disks) page. Here is my diagnosis, but please don't waste too much time on it, because of your advice (and thanks for heads-up on those disks), I am going to replace the disks. They are old but haven't been used for a number of years. media-diagnostics-20230224-1930.zip Also, somewhat strange (IMHO) is that there is no clear direction from the web UI on what to do in this case (the whole reason for having Unraid?). There is, however, some nice documentation giving the procedure, which is relatively simple. Let's see how I go... Quote Link to comment
AshleyAitken Posted February 24, 2023 Author Share Posted February 24, 2023 I shutdown the array. Set the disk to "No device." Then tried to set the disk back to the disk that "failed." It seem to set temporarily but then there were some notifications (sorry lost those after reboot) and that disk disappeared from the drop-down and is no longer in the list of "unassigned devices." I rebooted and still it doesn't appeared, so perhaps that disk has completely died 😞 Quote Link to comment
AshleyAitken Posted February 24, 2023 Author Share Posted February 24, 2023 Started array again (with Disk 2 as No Device) and still functioning without second disk. Stopped array again and the second disk is still no-where to be seen. I will hopefully get some new disks this weekend. Thanks for your help and guidance! Quote Link to comment
trurl Posted February 24, 2023 Share Posted February 24, 2023 SMART for parity looks OK with not many power-on hours. Both disks 1, 2 need to be replaced, but you can only rebuild one disk at a time. To reliably rebuild a disk, it must be able to reliably read all other disks. Disk1 may not work well enough to rebuild disk2, and disk2 isn't working at all. Neither disk has much data yet. If you don't need any of the data it will be simpler to just start over with new disks. Quote Link to comment
AshleyAitken Posted March 1, 2023 Author Share Posted March 1, 2023 Yes, minimal data, this is just a trial to get to know UnRaid. I am replacing both disks (one at a time) now, as a learning experience. I will see how it goes and if it doesn't work out I will restart. Thanks again. Quote Link to comment
AshleyAitken Posted March 3, 2023 Author Share Posted March 3, 2023 FYI, I replaced the disks one after the other, letting the system rebuild one disk before doing the other, and now it's all green lights. Now going to explore Docker apps... see if this very old hardware (without address mapping) can run it and handle any load. Overally, I'm generally impressed with UnRaid and thankful for support I received here. [Apologies for any dupes... I wasn't logged in and didn't realise it would post but be moderated.] Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.