January 30, 201610 yr I have several 500 and 750gb drives in an array that have been running for... many years. I do not know how to properly interpret the the smart report data so some assistance would be appreciated. # Attribute Name Flag Value Worst Threshold Type Updated Failed Raw Value Parity 5 Reallocated sector count 0x0033 100 100 036 Pre-fail Always Never 6 188 Command timeout 0x0032 100 099 000 Old age Always Never 2 Disk2 188 Command timeout 0x0032 100 099 000 Old age Always Never 12885098502 Disk3 188 Command timeout 0x0032 100 099 000 Old age Always Never 17180131332 I looked into this because at times browsing around the SMB shares is sometimes clunky. Right clicking on folders can be slow to open the context menu, sometimes copying files is is slow, almost zero. Thoughts?
January 30, 201610 yr With the reallocated sectors on your parity drive, I'd probably replace that. A big concern with reallocated sectors is that if that number continues to grow, you're looking at a drive that's probably going to die quickly. 6 sectors isn't a lot in terms of total sectors on the drive, but it's one of the indicators to look at on whether the drive is going to die. With SMART reports, it's all just an indicator to help you decide whether or not to trust the drive, not an end all be all. This drive could never have any more reallocated sectors and keep running for 10 more years, or it could have 35,000 more pop up tomorrow and die quickly. It's up to you whether you think it's worth replacing or not. I would myself. Regarding the command timeouts on the drives, that's my guess as to why you're seeing intermittent slowdowns. Especially on Disk 2 and 3, that's a very high number. That could be due to a bad PSU, shitty cables going bad. I don't know how many power on hours you're approaching but something is odd at those high numbers. Basically that is every time the OS tries to send a command to the drive and the drive isn't responsive thats the timeout.
January 30, 201610 yr I looked into this because at times browsing around the SMB shares is sometimes clunky. Right clicking on folders can be slow to open the context menu, sometimes copying files is is slow, almost zero. Thoughts? Are you using reiserfs with almost full disks? First thing that comes to mind with your symptoms. Alos, like lishpy suggested, keep an eye on the parity relocated sector count.
January 30, 201610 yr Author I'm going to replace the parity drive. It's an old 1.5TB drive from back when that was the biggest drive you could get. If the parity drive is falling apart that would explain crappy writes. Most of the drives are 3+ years powered on and have been in the array for 5 or more. This is a good time and excuse to replace everything with 2x4TB drives and remove the existing 7 disks that are spinning in there . I've only got about 3.2TB of data.
January 30, 201610 yr Author Are you using reiserfs with almost full disks? First thing that comes to mind with your symptoms. ReiserFS yes. One disk is 99 percent full so it never gets any writes; it's just movies. The rest are 70-80 percent.
January 30, 201610 yr ReiserFS yes. One disk is 99 percent full so it never gets any writes; it's just movies. The rest are 70-80 percent. Still believe that you’d notice a big improvement changing to XFS, unfortunately there’s no easy way to test without doing the actual conversion.
January 30, 201610 yr Author Well, if I just get two new 4tb drives and migrate to those, XFS will be easy.
February 1, 201610 yr I have several 500 and 750gb drives in an array that have been running for... many years. I do not know how to properly interpret the the smart report data so some assistance would be appreciated. # Attribute Name Flag Value Worst Threshold Type Updated Failed Raw Value Parity 5 Reallocated sector count 0x0033 100 100 036 Pre-fail Always Never 6 188 Command timeout 0x0032 100 099 000 Old age Always Never 2 Disk2 188 Command timeout 0x0032 100 099 000 Old age Always Never 12885098502 Disk3 188 Command timeout 0x0032 100 099 000 Old age Always Never 17180131332 I looked into this because at times browsing around the SMB shares is sometimes clunky. Right clicking on folders can be slow to open the context menu, sometimes copying files is is slow, almost zero. Thoughts? Please ignore the 'Command timeout' numbers, especially those that appear as very large numbers. They are being misinterpreted by an older SMART tool. The RAW value is actually 48 bits, comprising 3 16 bit values. If you look at the 2 large numbers above in hex, you will see that the first is "3 3 6" and the second is "4 4 4", very low and inconsequential. If possible, try a later version of the SMART tool to obtain the SMART report. I had wanted to report my findings on the 'Command timeout' attribute, and did quite a bit of research, but can't seem to ever finish. What I did find is very inconsistent. There are 2 descriptions for it, quite different, and the better known one (and often quoted one) on the SMART wiki page is probably wrong, almost certainly partly wrong. Associating it with cabling seems impossible, unless someone knows of commands *issued* by a hard drive. Normally as a hosted device, the commands are issued by the host controller, not the other way around. Plus, if it were cabling related, that would make BackBlaze's reliance on it as a measure of drive longevity very strange! A bad cable does not make a bad hard drive, although it can make it look like a bad drive to the host. I also found BackBlaze's published graphs to be almost embarrassing. I'm sure they have very knowledgeable people there, who do understand SMART. But the graphs were prepared by someone who wasn't as knowledgeable. A number of the graphs include single humps at the 100 point AND the 200 point, clearly not normalized. 200 is the same as 100, just scaled differently, and if you want to include those attribute values in a single graph, they HAVE to be normalized to the same scale. The 'Command timeout' graph includes the very large numbers, not correctly interpreted as 3 small numbers, which makes the value of that graph nil. I can't help worrying that they based their opinion about the 'Command timeout' attribute on the faulty representation in that graph.
Archived
This topic is now archived and is closed to further replies.