DrBobke Posted January 10, 2022 Share Posted January 10, 2022 Hi All, since some months, I am having smart errors on two drives (one Parity and one array drive). I have done the smart Short self-test on both and it couldn't find any issues. I then thought to run the extended self-test, at first I tried both simultaneously, but it keeps stuck at 10% (when it starts, it starts at 10% and never progresses), after a few days of running, I decided to 'kill' the extended test on the normal HDD and only let it run the test on the Parity drive. After a few days, still at 10%, I decided to kill that test too. I rebooted my pc and the Unraid and ran another extended test, but it has now been running for 13 days and 21 hours and it is still stuck at 10%. Anyone knows why it doesn't want to progress further? It's driving me crazy. I even noticed a few weeks ago, that after having rebooted my PC, the self-test read 'aborted by user', which seems strange, as the self-test should be run inside unraid itself and there is no reason for it aborting just from rebooting my PC, no? I have read different articles on the forum (this one was most helpful), but seems I cannot even get it to run through the entire thing. The Parity drive is only a few months old, the HDD is just over 1 year old, but as said, it has been like that since the beginning, or at least for several months. Both (all in the array), have been pre-cleared before and were ready to go, bought them from good suppliers that I trust and are new (no other deployment in another machine). The Parity is even my 2nd parity drive, should that help. Hope someone can help. Thanks in advance! Best regards, Quote Link to comment
JorgeB Posted January 10, 2022 Share Posted January 10, 2022 Make sure that spin down is disable, though that usually interrupts the test. Quote Link to comment
DrBobke Posted January 10, 2022 Author Share Posted January 10, 2022 Thanks a lot, disks were spun down indeed, very strange, I would assume the disks had operations, so they would not spin down. I have killed the self-test and restarted it. Hopefully it will move along this time around.. Quote Link to comment
DrBobke Posted January 10, 2022 Author Share Posted January 10, 2022 After making sure the disks now keep running and killing the extended smart test, it is still stuck at 10% (this has been over 5 hours). The Parity drive is 14TB, but still, I should be seeing some progress by now, no? Quote Link to comment
Vr2Io Posted January 10, 2022 Share Posted January 10, 2022 Pls provide diagnostic. Quote Link to comment
DrBobke Posted January 11, 2022 Author Share Posted January 11, 2022 Hereby attaching the diagnostics file. As I read in other posts, it would be helpful to note which drives are affected. Parity 2 (sdf) Z030A010FP7G (14TB) and Disk 3 (sdg) 6050A07RFPBG (12TB). Any help is greatly appreciated! leonore-diagnostics-20220111-0906.zip Quote Link to comment
DrBobke Posted January 11, 2022 Author Share Posted January 11, 2022 I don't know how or why, but all of the sudden, this went from 10% (I checked this morning) to 'completed without error' when I checked just now. I am posting the outcome of Parity drive II here. I will post the same results when disk 3 clears (if that happens). leonore-smart-20220111-1901.zip Quote Link to comment
Vr2Io Posted January 11, 2022 Share Posted January 11, 2022 (edited) 1 hour ago, DrBobke said: completed without error Extended test were disk self test, Unraid just polling the status from disk periodic. If it stuck (not abort) it may be polling issue. Btw, as success now then no problem at all. Unraid have SMART polling timer setting, what is curent value ? Test time should be around 0.5TB per hour, if 14TB need much more then 28hrs, then something still abnormal. Edited January 11, 2022 by Vr2Io Quote Link to comment
DrBobke Posted January 12, 2022 Author Share Posted January 12, 2022 I'm pretty sure it has taken longer than 28 hours. Also, the SMART status on the disk still shows as error in orange with a thumbs down. Disk 3 (12TB) is still clearing and also stuck at 10% (I don't see CPU or other parameters shoot up unexpectedly). I don't know where I can find the SMART polling timer setting, do you mean SMART capabilities:0x0003Saves SMART data before entering power-saving mode. Supports SMART auto save timer. ? I am sure I wouldn't have touched any such setting, so it should be at default, if there is any... Quote Link to comment
JorgeB Posted January 12, 2022 Share Posted January 12, 2022 31 minutes ago, DrBobke said: I don't know where I can find the SMART polling timer setting In the SMART report: Extended self-test routine recommended polling time: (1324) minutes. It will take longer if the disk is being used during the test. Quote Link to comment
DrBobke Posted January 12, 2022 Author Share Posted January 12, 2022 The disk is not being used during the test. Yesterday evening, my weekly Parity-Check started, which will end in a few hours. But that was after the Parity II test ended. Is there something I can/should change there? And anything you can tell me why the SMART status in Dashboard remains on Error or what I can do to fix that? As said, disk is less than 1 year old, the other one is also still in warranty at just over 1 year old. Quote Link to comment
Michael_P Posted January 12, 2022 Share Posted January 12, 2022 If the drive passed and you're OK with it, click the thumb and acknowledge the error Quote Link to comment
Vr2Io Posted January 12, 2022 Share Posted January 12, 2022 (edited) 2 hours ago, DrBobke said: Also, the SMART status on the disk still shows as error It is because item 5 and 196 not zero value, this not a good sign, if disk have warranty then I will got RMA. FYR, my disks almost all problem free even they age 5yrs+, only 2 got problem, but I also have hard time for many disks got trouble in past. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate PO-R-- 100 100 050 - 0 2 Throughput_Performance P-S--- 100 100 050 - 0 3 Spin_Up_Time POS--K 100 100 001 - 7721 4 Start_Stop_Count -O--CK 100 100 000 - 799 5 Reallocated_Sector_Ct PO--CK 100 100 050 - 8 7 Seek_Error_Rate PO-R-- 100 100 050 - 0 8 Seek_Time_Performance P-S--- 100 100 050 - 0 9 Power_On_Hours -O--CK 084 084 000 - 6427 10 Spin_Retry_Count PO--CK 100 100 030 - 0 12 Power_Cycle_Count -O--CK 100 100 000 - 6 23 Helium_Condition_Lower PO---K 100 100 075 - 0 24 Helium_Condition_Upper PO---K 100 100 075 - 0 191 G-Sense_Error_Rate -O--CK 100 100 000 - 0 192 Power-Off_Retract_Count -O--CK 100 100 000 - 0 193 Load_Cycle_Count -O--CK 100 100 000 - 908 194 Temperature_Celsius -O---K 100 100 000 - 42 (Min/Max 25/54) 196 Reallocated_Event_Count -O--CK 100 100 000 - 1 197 Current_Pending_Sector -O--CK 100 100 000 - 0 198 Offline_Uncorrectable ----CK 100 100 000 - 0 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0 220 Disk_Shift -O---- 100 100 000 - 100925444 222 Loaded_Hours -O--CK 097 097 000 - 1407 223 Load_Retry_Count -O--CK 100 100 000 - 0 224 Load_Friction -O---K 100 100 000 - 0 226 Load-in_Time -OS--K 100 100 000 - 584 240 Head_Flying_Hours P----- 100 100 001 - 0 2 hours ago, DrBobke said: I don't know where I can find the SMART polling timer setting Try check below setting, I set 600 means polling every 10min. Pls note all extended test was "abort by host", never complete. SMART Extended Self-test Log Version: 1 (1 sectors) Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Aborted by host 90% 6231 - # 2 Extended offline Aborted by host 90% 6203 - # 3 Extended offline Aborted by host 90% 6089 - # 4 Extended offline Aborted by host 90% 6083 - # 5 Short offline Completed without error 00% 6081 - A normal test result like as below Edited January 12, 2022 by Vr2Io 1 Quote Link to comment
DrBobke Posted January 13, 2022 Author Share Posted January 13, 2022 Thanks a lot for your information. Disk 3 just finished its extended run (I'm sure it has taken more than 28 hours), when I went to bed last night, it was around 80%. Hereby attaching the zip file. Thanks again for all your help. Should I okay the errors, or send the drive(s) back? leonore-smart-20220113-0922.zip Quote Link to comment
ChatNoir Posted January 13, 2022 Share Posted January 13, 2022 18 minutes ago, DrBobke said: Should I okay the errors, or send the drive(s) back? Looks OK to me No SMART error except for UDMA_CRC and Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 8368 - Quote Link to comment
Vr2Io Posted January 13, 2022 Share Posted January 13, 2022 54 minutes ago, ChatNoir said: Looks OK to me Same Quote Link to comment
DrBobke Posted January 13, 2022 Author Share Posted January 13, 2022 Awesome, thanks a lot, so should I send back the Parity drive II and okay disk 3 then, or just keep Parity drive II as well? I don't know how serious the errors are.. Thanks again! Quote Link to comment
Vr2Io Posted January 13, 2022 Share Posted January 13, 2022 RMA Parity2 fine, if RMA both then no redundancy any more. Quote Link to comment
DrBobke Posted January 13, 2022 Author Share Posted January 13, 2022 I think I bought them from two separate companies, so I could first return the Parity drive and get a new one and then disk 3, no? Since there is 'only a few TB' on Disk 3, I could copy that over to an external HDD too if needed, or doesn't it work that way? Quote Link to comment
Vr2Io Posted January 13, 2022 Share Posted January 13, 2022 6 hours ago, DrBobke said: I could first return the Parity drive and get a new one and then disk 3 This is normal procedure to replace fault disk one by one. For disk3 item "199" means SATA interface record error, in most case, it is controller / cable issue. If your system have such issue, then same error can happen again on the replacement disk. So, RMA or not for disk3 really depends on you. 199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 6 6 hours ago, DrBobke said: I could copy that over to an external HDD too if needed, or doesn't it work that way? That's good, so if rebuild disk3 in fail or corrupt, you still got a backup. Quote Link to comment
DrBobke Posted January 14, 2022 Author Share Posted January 14, 2022 Okay, thanks a lot! I will check if I can see anything 'not right' with disk 3. Thanks a lot for all your help, I really appreciate it! What is the best course of action in removing the Parity II? Shut down UnRaid, pull the drive out and power back up, or do I need to do that in another way? Quote Link to comment
itimpi Posted January 14, 2022 Share Posted January 14, 2022 1 hour ago, DrBobke said: What is the best course of action in removing the Parity II? Shut down UnRaid, pull the drive out and power back up, or do I need to do that in another way? Stop Array, unassign parity2, restart array to commit change. Your steps would leave Unraid complaining about a missing parity2 drive 1 Quote Link to comment
Vr2Io Posted January 14, 2022 Share Posted January 14, 2022 3 hours ago, DrBobke said: What is the best course of action in removing the Parity II? Shut down UnRaid, pull the drive out and power back up, or do I need to do that in another way? 1 hour ago, itimpi said: Stop Array, unassign parity2, restart array to commit change. Both fine, end up also rebuild parity2 with replacement disk. 1 Quote Link to comment
DrBobke Posted January 15, 2022 Author Share Posted January 15, 2022 (edited) Hey all, I unassigned Parity II, as said (stop array, go to Parity section, select dropdown and unassign, checked that it was unassigned and shut down unraid, removed Parity II and booted up again, but was stuck with 'Stale configuration', the only way to being able to start the array again, is insert Parity II again. What is going on? I obviously want to start the array without Parity II being present... Edit - SHIT! I just see I have lost all my dockers??? Nextcloud, duckdns, mariadb, Plex, Wireguard, OpenVPN, all of them gone! Also my 2 VM's are gone! HEEEELLLPPPP leonore-diagnostics-20220115-1450.zip I have just checked, my entire AppData folder is EMPTY! 😮 Luckily, I still have a backup of it from 9/01/2022 at 1.06AM. Can I restore without having to use the backup? I hope to God I didn't loose any data, but how on earth is this possible? I did the right things, no? Edited January 15, 2022 by DrBobke added lost dockers and VM's Quote Link to comment
itimpi Posted January 15, 2022 Share Posted January 15, 2022 2 hours ago, DrBobke said: Edit - SHIT! I just see I have lost all my dockers??? Nextcloud, duckdns, mariadb, Plex, Wireguard, OpenVPN, all of them gone! Also my 2 VM's are gone! HEEEELLLPPPP Did not spot nothing obvious in the diagnostics. The symptoms suggest that the drive where the appdata and/or system shares are located might have problems. Looking at the diagnostics appdata share seems to be on disk3 and system on disk1. If that is what you expect I would suggest running a file system check on these drives. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.