January 30, 20188 yr Community Expert 1 hour ago, gnollo said: SMART Extended self test still running after 6 hours, now at 90%, does it normally take that long? From the disk's SMART info: Quote Extended self-test routine recommended polling time: ( 546) minutes.
January 30, 20188 yr Extended self test reads through every single sector on the drive. I don't know the inside specifics of your drive, but just an illustrative example below: If the drive has 200k tracks / surface and 12 heads, then it must rotate more than 2.4 million turns to read every sector - the drive only uses one head at a time. If the drive rotates at 7200 rpm and we ignore the seek time when switching tracks then 2.4 million rotations / 7200 rpm / 60 hours = 5.55 hours. If the drive has more cylinders or rotates slower, then it will take longer. And every time the head moves to a new track it will take a while to synchronize and center exactly over the new track - so the drive will not manage to read sectors for 100% of the time. And since tracks in new drives aren't perfectly aligned as cylinders anymore, the drive can't rotate one turn to read with first head and then on next rotation read with second head and next turn read with third head - moving from one head to another means a new search operation to align with the actual track locations on that surface. So when doing a linear read, the drive will read a number of tracks with one head. Then switch over and do a number of tracks with another head. Then a number of tracks with yet another head. So if the drive has 200k tracks / surface and 12 heads then there will in total be 2.4 million track seeks that will add to the total read-test time.
January 30, 20188 yr Author completed without errors. I guess I am good to go with rebuilding tower-smart-20180129-2240.zip
January 30, 20188 yr Community Expert Hmm, test passed and read_raw_rate remains at 1, but: before: Quote 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 1 200 Multi_Zone_Error_Rate ---R-- 100 253 000 - 0 after: Quote 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 1 200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 2 Not a failing drive, but again, not a good sign, I would still rebuild but keep an eye on it, and if it fails again I would replace it.
January 31, 20188 yr Author Oh dear. Read errors 10 minutes in, rebuild cancelled, disk disabled. I am thinking of stopping the array, removing drive 7, power down, moving the drive to a different slot in my Norco 550, power it up again, stopping the array, selecting the drive and try another rebuild before I buy a replacement drive. Any objections to this course of action? tower-diagnostics-20180131-0630.zip
January 31, 20188 yr Community Expert It dropped offline again, I'd like to see a new SMART report and if it looks fine maybe try a last rebuild but after swapping it with a disk on the onboard controller, just to rule out the SAS2LP and any possible cable issue.
January 31, 20188 yr Author another smart report, a quick test this time. Another pass, I will move the drive to a different slot and connect it directly to the motherboard tower-smart-20180131-2041.zip
January 31, 20188 yr Community Expert More important SMART attributes remain unchanged, using the motherboard controller will help confirm if the disk has issues or not, though surviving a rebuild won't really be enough, but it's a start.
February 1, 20188 yr Author Data rebuild at 53.8%, has been running for 8 hours now, I am keeping my fingers crossed...
February 1, 20188 yr Author Event: unRAID Parity sync / Data rebuild Subject: Notice [TOWER] - Parity sync / Data rebuild finished (0 errors) Description: Duration: 14 hours, 55 minutes, 40 seconds. Average speed: 148.9 MB/s Importance: normal Sent from my Nexus 5 using Tapatalk
February 1, 20188 yr Community Expert Great, a few more days should confirm if the disk is really good or not, in any case I would recommend replacing the SAS2LP, I know it can be difficult to justify when it appears they mostly work OK, but they are a know problem and IMO it's just a matter of time until they cause problems, other than some sporadic use on my test server I have all my SASLP/SAS2LP gathering dust, and they were quite expensive since I bought them all new, but they are a ticking time bomb.
February 1, 20188 yr Author Great, a few more days should confirm if the disk is really good or not, in any case I would recommend replacing the SAS2LP, I know it can be difficult to justify when it appears they mostly work OK, but they are a know problem and IMO it's just a matter of time until they cause problems, other than some sporadic use on my test server I have all my SASLP/SAS2LP gathering dust, and they were quite expensive since I bought them all new, but they are a ticking time bomb. Damn. I only have one drive on it so far. What do you use now?Sent from my Nexus 5 using Tapatalk
February 1, 20188 yr Community Expert 2 minutes ago, gnollo said: Damn. I only have one drive on it so far. What do you use now? Now I only use the onboard Intel controller, LSI HBAs and SAS expanders when needed. As for the LSI models, any LSI with a SAS2008/2308/3008 chipset in IT mode, e.g., 9201-8i, 9211-8i, 9207-8i, 9300-8i, etc and clones, like the Dell H200/H310 and IBM M1015, these latter ones need to be crossflahed.
March 3, 20188 yr Author Well, after I went away for a few days I returned to find read errors on four drives, rebooted, and the drives were then reported as missing. After a quick check I realised that they were all in the same Norco drive bay. I have another two drive bays (spare) from Norco, so I moved the drives over one of the spare ones and voila, it all works again. I did a parity check (without rewrite), and it found no errors in the parity. It did throw up 93 read errors early during parity check on drive 7, and now in the status emails I get a FAILED report because of that. How shall I go about correcting it? BTW I checked and disk7 is connected directly to the motherboard. Edited March 3, 20188 yr by gnollo
March 4, 20188 yr Author 2 hours ago, johnnie.black said: Please post your diagnostics: Tools -> Diagnostics here it is tower-diagnostics-20180304-1027.zip
March 4, 20188 yr Community Expert It's the same disk that showed issues before: On 1/30/2018 at 11:12 PM, johnnie.black said: Quote 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 1 200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 2 Not a failing drive, but again, not a good sign, I would still rebuild but keep an eye on it, and if it fails again I would replace it. Now it's much worse, you'll need to replace it: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-K 200 200 051 - 44 200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 2
March 4, 20188 yr Author Can you please explain how is it much worse? The only change I see is raw value going from 1 to 44?
March 4, 20188 yr Community Expert Can you please explain how is it much worse? 44 is much worse than 1[emoji4] For WD disks Ideally it should be 0, but anything above single digits errors on this attribute is very bad news, that together with the UNC at LBA errors reported on the SMART report makes this a failing disk, and very likely it won't get any better, it will only get worse.
March 21, 20188 yr Author On 04/03/2018 at 12:04 PM, johnnie.black said: 44 is much worse than 1 For WD disks Ideally it should be 0, but anything above single digits errors on this attribute is very bad news, that together with the UNC at LBA errors reported on the SMART report makes this a failing disk, and very likely it won't get any better, it will only get worse. Mmmh it's still runnign, no change, run another SMART test, it passed, rebooted the unraid server and it thinks now it's all OK. SMART test attached. Shouldn't UNRAID take care of the drive it if fails anyway? I am going to run another full parity check this weekend. tower-smart-20180321-1928.zip
March 21, 20188 yr Community Expert It can run for week or months, and yes, unRAID will take care of it, but if you have single parity it can fail again at a bad time, i.e., when you're rebuilding another disk, if you have dual parity much less risky.
March 21, 20188 yr Community Expert Do you know that unRAID can only "take care" of as many disks as you have parity? It requires parity plus all other disks to rebuild a missing or disabled disk. Why would you even consider continuing to run with this disk? What if you have another disk problem? Letting problems accumulate is the path to data loss.
March 21, 20188 yr The main question to ask whenever setting up a file server is: how much data are you prepared to lose? And how much are you prepared to spend to reduce the probability of data loss? Best is to write that down and tape on the file server - so you are constantly remembering what decisions you have made. If you add more valuable files then your previous decisions needs to be reevaluated. If the data is important, then the files should be stored on known-good drives. And there should be proper off-site backups. Selecting a system with parity indicates that you have somewhere made a decision that you value at least some of the file data.
March 21, 20188 yr Author Yes 1 minute ago, pwm said: The main question to ask whenever setting up a file server is: how much data are you prepared to lose? And how much are you prepared to spend to reduce the probability of data loss? Best is to write that down and tape on the file server - so you are constantly remembering what decisions you have made. If you add more valuable files then your previous decisions needs to be reevaluated. If the data is important, then the files should be stored on known-good drives. And there should be proper off-site backups. Selecting a system with parity indicates that you have somewhere made a decision that you value at least some of the file data. Yep I did, but really all my files are movies, commodity data which I have on disk and I like on a server because of all the drill through functionality that emby provides. I have my pictures and documents backed up on google cloud, I pay for 100GB and pictures are uploaded free so I have a backup to the unraid backup. I understand guys that I am taking a risk, but we are talking here of forking out for another 8TB drive today, instead of forking out when the drive fails, some time in the future. I will only lose data if 2 drives fail at the same time, right? I have alerts setup so every night I get a report from unraid, if another drive (or this one) fails, I will replace it. I guess that if another drive fails, and as I try to rebuild it, this one also fails, I am in big trouble. Decision time I guess. Anyone knows good deals for 8TB drives in the UK?
March 22, 20188 yr Author https://www.ebuyer.com/760589-seagate-ironwolf-8tb-3-5-nas-hard-drive-at-ebuyer-com-st8000vn0022 would you recommend this drive as a data one (not parity)?
Archived
This topic is now archived and is closed to further replies.