tillkrueger Posted June 15, 2016 Share Posted June 15, 2016 One problem solved and immediately I am struck with the next one. Just got a notification that my unRAID array, which is all the way up in Berlin (500 miles away from me), is in a compromised state after disk 9 is marked as disabled after it produced 9 errors during a New Permissions operation. Even though i do have a cache drive in my server, I am not really using it as such...it is not configured for its intended purpose, but only as an additional drive to copy data to and from during my usual file-maintenance. Since the cache drive is also a 3TB drive, just like my failed data disk, what would the correct procedure be to make my cache drive the new data disk 9, without risking the loss of data? Quote Link to comment
JorgeB Posted June 15, 2016 Share Posted June 15, 2016 -stop array -unassign cache disk -assign cache disk to failed disk slot -start array to begin rebuild Quote Link to comment
tillkrueger Posted June 15, 2016 Author Share Posted June 15, 2016 Well, when you say it like that, Johnnie, you make it sound so easy...and it was...rebuilding now...I forget sometimes how brilliant the unRAID system is. Thanks for the concise instructions! Is there some way you know of to try to remotely "fix" the failed drive? Quote Link to comment
JorgeB Posted June 15, 2016 Share Posted June 15, 2016 Post the diagnostics: Tools > Diagnostics Quote Link to comment
tillkrueger Posted June 15, 2016 Author Share Posted June 15, 2016 here they are. I am amazed about how regularly these WD-Red drives fail on me...I must have had about 10 of them go bad on me in the past 3-4 years...very frustrating...I thought they are meant to be particularly long-lasting in these type of server applications. unraid-diagnostics-20160615-1050.zip Quote Link to comment
testdasi Posted June 15, 2016 Share Posted June 15, 2016 I am amazed about how regularly these WD-Red drives fail on me...I must have had about 10 of them go bad on me in the past 3-4 years...very frustrating...I thought they are meant to be particularly long-lasting in these type of server applications. Do they come from the same order? You might have a bad batch (or WD Red drives are crap, or you have bad luck, or any of those combinations ). I made it a point to never order the same drive from the same seller within a short period / same order. I read somewhere that drives on the same batch tend to have similar issues. Quote Link to comment
tillkrueger Posted June 15, 2016 Author Share Posted June 15, 2016 I wish it'd be that obvious a reason, testdasi, but i've been ordering WD-Green and Red drives for the past 15 years, from all over the place (Amazon, B&H, CyberPort, EggeHead) in both the USA and Germany, with no discernible commonalities (other than that they are all WD drives), so my guess is that it's a combination of bad quality and bad luck...what's worse is that WD's warranty isn't international, and that they won't swap German drives in the USA, or US drives in Germany, which makes it very difficult for someone like me who lives and works on two continents. which drives do you use/recommend? I might have to start looking elsewhere and slowly start swapping out drives, one by one, to another manufacturer. Quote Link to comment
JorgeB Posted June 15, 2016 Share Posted June 15, 2016 The failed disk dropped offline, so there's no SMART report, when the rebuild finishes reboot the server and post new diagnostics or just the SMART report for that disk. Quote Link to comment
tillkrueger Posted June 15, 2016 Author Share Posted June 15, 2016 will do, johnnie...about 12hrs to go for the rebuild. Quote Link to comment
testdasi Posted June 15, 2016 Share Posted June 15, 2016 which drives do you use/recommend? I might have to start looking elsewhere and slowly start swapping out drives, one by one, to another manufacturer. I really don't want to go into brand recommendation cuz the proverbial "YMMV" applies. My sample is also quite small, only about 10+ drives. Of which the only one that failed was a WD Black - but then it was because Amazon stupidly posted it in a thin cardboard envelope with zero padding. Personally, I have always put Hitachi as my first choice (it appears to be confirmed by Backblaze in 2015 (<1% failure rate vs WD 2.5% and Seagate 3%). But it is now owned by WD (rebranded as "HGST") so who knows where things are heading. Quote Link to comment
tillkrueger Posted June 15, 2016 Author Share Posted June 15, 2016 points well taken, testdasi...i'm just getting tired of looking at a big stack of failed 1, 2 and 3TB drives from the past 15 years, that all bear the WD label, and for many of which i wasn't able to take advantage of the warranty as sending them across the Atlantic is just not economical or timely...time to find a new drive manufacturer with a better track-record and international warranty. Quote Link to comment
tillkrueger Posted June 16, 2016 Author Share Posted June 16, 2016 ok, the rebuild has completed, parity is synced and the array started again, but how would i now be able to get a SMART report from the failed disk...i am not able to see it anywhere...how would i be able to enable the cache feature and maybe select the failed disk, so that i can at least access it via telnet and maybe run some check-disk operations on it? or is there a better way to access it for potential repairs? (i am pretty sure there is) Quote Link to comment
JorgeB Posted June 16, 2016 Share Posted June 16, 2016 Did you reboot? If disk is still offline then try to get someone to power cycling the server, if after that it's still offline then it's probably dead. Quote Link to comment
tillkrueger Posted June 16, 2016 Author Share Posted June 16, 2016 no, i haven't rebooted yet, but will do so shortly and report back. Quote Link to comment
tillkrueger Posted June 16, 2016 Author Share Posted June 16, 2016 ok, after a reboot the disk is showing again, and i added it as a cache drive for the moment (without re-starting the array). trying to get the SMART test results, but it's been taking 10 minutes already...will post when/if completed. the Attributes tab produces the following results (excuse the formatting): # Attribute Name Flag Value Worst Threshold Type Updated Failed Raw Value 1 Raw read error rate 0x002f 200 200 051 Pre-fail Always Never 4 3 Spin up time 0x0027 179 174 021 Pre-fail Always Never 6008 4 Start stop count 0x0032 100 100 000 Old age Always Never 486 5 Reallocated sector count 0x0033 200 200 140 Pre-fail Always Never 0 7 Seek error rate 0x002e 200 200 000 Old age Always Never 0 9 Power on hours 0x0032 084 084 000 Old age Always Never 12201 (1y, 4m, 22d, 9h) 10 Spin retry count 0x0032 100 100 000 Old age Always Never 0 11 Calibration retry count 0x0032 100 253 000 Old age Always Never 0 12 Power cycle count 0x0032 100 100 000 Old age Always Never 70 192 Power-off retract count 0x0032 200 200 000 Old age Always Never 15 193 Load cycle count 0x0032 200 200 000 Old age Always Never 470 194 Temperature celsius 0x0022 120 085 000 Old age Always Never 30 196 Reallocated event count 0x0032 200 200 000 Old age Always Never 0 197 Current pending sector 0x0032 200 200 000 Old age Always Never 1 198 Offline uncorrectable 0x0030 100 253 000 Old age Offline Never 0 199 UDMA CRC error count 0x0032 200 200 000 Old age Always Never 0 200 Multi zone error rate 0x0008 100 253 000 Old age Offline Never 0 Quote Link to comment
JorgeB Posted June 16, 2016 Share Posted June 16, 2016 Disk has pending sector(s), that's why it was disabled, you can try preclearing it a few times and see if pending sectors go to and stay at 0, still there's a high probability of getting more in the future. Quote Link to comment
tillkrueger Posted June 16, 2016 Author Share Posted June 16, 2016 yeah, I am prepared to wave that drive goodbye, like all the other ones, but while i don't have physical access to the server, which may be the case for many weels to come, i might as well try to get some use out of it, that is non-critical. what's the correct procedure to pre-clear this drive? is there a telnet command that executes this, and if so, how long does one pre-clearing pass take? Quote Link to comment
tillkrueger Posted June 16, 2016 Author Share Posted June 16, 2016 just found the preclear plugin in Community Apps, installed it and the preclear_disk.sh script, and will see how far i can get with this...looks like this process will take many days if i do the 3 cycles you recommend...here's to trying! Quote Link to comment
tillkrueger Posted June 16, 2016 Author Share Posted June 16, 2016 when opening the monitor pop-up window for this operation i get this: /boot/config/plugins/preclear.disk/preclear_disk.sh -c 1 /dev/sdd 2>/tmp/preclear.log root@unRAID:/usr/local/emhttp# /boot/config/plugins/preclear.disk/preclear_disk.sh -c 1 /dev/sdd 2>/tmp/preclear.log Sorry: Device /dev/sdd is busy.: 1 root@unRAID:/usr/local/emhttp# why is dev/sdd busy and how do i resolve this? Quote Link to comment
JorgeB Posted June 16, 2016 Share Posted June 16, 2016 Old preclear + plugin doesn't work on v6.2-beta, use the new beta: http://lime-technology.com/forum/index.php?topic=39985.msg453938#msg453938 Quote Link to comment
tillkrueger Posted June 16, 2016 Author Share Posted June 16, 2016 thanks so much for all your great pointers, johnnie...it is now doing the preclear: sdd WDC_WD30EFRX-68AX9N0_WD-WMC1T0098667 31 C 3 TB Pre-Read: 0% @ 141 MB/s (0:01:18) i am starting with 1 cycle and see where i'll end up...will do the other 2 cycles once i find out how long the first will take....could be a loooong time. Quote Link to comment
tillkrueger Posted June 16, 2016 Author Share Posted June 16, 2016 hmm, the last thing i saw was something about failing a write because of a broken pipe, and this is what the status window now shows: ############################################################################################################################ # # # unRAID Server Pre-Clear of disk /dev/sdd # # Cycle 1 of 1, partition start on sector 64. # # # # # # Step 1 of 5 - Pre-read verification: [2:15:17 @ 369 MB/s] SUCCESS # # Step 2 of 5 - Zeroing the disk: [0:00:03 @ 0 MB/s] SUCCESS # # Step 3 of 5 - Writing unRAID's Preclear signature: SUCCESS # # Step 4 of 5 - Verifying unRAID's Preclear signature: FAIL # # # # # # # # # # # # # # # # # ############################################################################################################################ # Cycle elapsed time: 2:16:09 | Total elapsed time: 2:16:10 # ############################################################################################################################ ############################################################################################################################ # # # S.M.A.R.T. Status # # # # # # ATTRIBUTE INITIAL STATUS # # 5-Reallocated_Sector_Ct 0 - # # 9-Power_On_Hours 12203 - # # 194-Temperature_Celsius 31 - # # 196-Reallocated_Event_Count 0 - # # 197-Current_Pending_Sector 1 - # # 198-Offline_Uncorrectable 0 - # # 199-UDMA_CRC_Error_Count 0 - # # # # # # # # # # # ############################################################################################################################ # SMART overall-health self-assessment test result: PASSED # ###########################################################################################################################? --> FAIL: unRAID's Preclear signature not valid. should i try another one? although the disk now disappeared from the unassigned disk choices in the preclear settings. Quote Link to comment
JorgeB Posted June 16, 2016 Share Posted June 16, 2016 Disk probably dropped offline again, if yes you need to reboot for it to show up. Quote Link to comment
tillkrueger Posted June 16, 2016 Author Share Posted June 16, 2016 since this first cycle "only" took a bit over 2hrs (maybe because it stopped short of completing?), is there an advantage of choosing 3 cycles right off the bat? if so, i'll run it later, overnight. (i am in Germany, where it's 1:42pm now) Quote Link to comment
tillkrueger Posted June 16, 2016 Author Share Posted June 16, 2016 looking at the preclear status window i noticed that the pending sector count has increased from 1 to 2. after doing a search on reiserfs pending sector fix i came across someone's post saying the following: Recently I had this problem with a hard drive and a reiserfs partition. There were 6 pending sectors. After reading BadBlockHowTo.txt I was almost hopeless. Anyway I did # dd if=/dev/hdc1 of=/dev/null bs=512 on a faulty partition. Of course, it failed at the same block as smartctl showed as bad. Then I ran again same command # dd if=/dev/hdc1 of=/dev/null bs=512 it stopped on another block. After the third run of dd there were no bad blocks - they were all relocated automagically... smartctl confirmed this. No clue why, but it worked. is this something that might work in my situation as well, and if so, is the dd command included in unRAID, or if not, maybe installable by someone as inexperienced with the command prompt as i am? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.