SMART reporting, testing, and notifications

February 26, 201610 yr

* Small bug: Downloaded SMART reports may not have unique file names, timestamp is created once per session

Expected behavior - each SMART report should have a unique file name, for archiving; should have fresh timestamp at file creation time

Actual behavior - click on disk from Main screen and enter disk settings page with all SMART info, first SMART report download for session has unique file name, but after various SMART activities such as testing, next SMART report download has the same file name (both the .txt and .zip) as the first download, and overwrites it (or has to be renamed)

Scenario - I was concerned about a drive, so wanted SMART long test with before and after SMART reports, but the 'before' report was overwritten

Workaround - Download a report and do your testing, then exit back to Main screen, then return to SMART settings to download second report

* Small bug: Test results (and probably error log too) are based on SMART report first obtained on entering SMART info section for a drive, and do not have the results of subsequent SMART tests

Expected behavior - be able to do a SMART test (short or long), then on its completion view the test results

Actual behavior - doesn't matter how many tests you run, they aren't available to view until you exit back to Main screen then return

Workaround - run the desired test(s), then exit back to Main screen, then return to SMART settings to see the results

* Request: Attribute 199 be added to the default attribute set for ALL users (checked ON as default)

The current 5 attributes chosen were based on statements by BackBlaze, of attributes useful for predicting drive failure. That's valid, but they aren't the only attributes of interest, for us and for them. Attribute 199 provides evidence of a bad SATA cable/connection, which is also important, even though it has nothing at all to do with drive failure. I'm very sure they monitor it also, and if it begins increasing, they know to replace a cable and check connections, just as our users also need to do. Actually our users need it more than BackBlaze does, as they have probably standardized on a high quality source for their cables, and our users haven't!

Implementation details - This is the one attribute where a RAW increase of one should be ignored. No cable is perfect, it's possible for any cable to have at least one CRC error once in awhile. If a cable is bad however, it won't stop at one CRC error, it will have many, and it will continue to have them. Also, I recommend that when this is added, you ignore a current non-zero count, no warnings given. If there's a cable problem, the number will continue to grow and then the warnings will trigger.

* Request: Remove the 'SMART notification value' and 'SMART notification tolerance level' features, both globally and per disk

I wanted to like these features, as they were an attempt to provide flexibility to the users, particularly because of the inconsistency and variability in how SMART attributes behave. But they don't fit the behavior of the attributes that were chosen, at all. SMART notification value should ALWAYS be set to 'RAW', and if it were changed by a user to 'Normalized', it would be a mistake, would not work correctly at all. There *are* attributes where it is correct to monitor the VALUE (known here as the 'Normalized'), but they aren't included in the 5 attributes provided. The error rate attributes are one case, where you should ALWAYS ignore the RAW, and watch the VALUE. And the error rate attributes are also candidates for using a form of the 'SMART notification tolerance level', warning the user if they approach the threshold by a given percentage.

These 2 features are currently a one size fits all, but the attributes are all different. Any given setting for these could be right for some attributes but would be clearly wrong for other attributes. The first one HAS to be 'RAW', and the second SHOULD be 'Absolute', which makes them both futile. If a user changed either one, they would be wrong. A change in Current Pending Sector count MUST create a warning even if it changes by only one, so it HAS to be 'Absolute'.

I should have written up these last 2 requests a long time ago, but could never finish researching about it all, especially about Command Timeout (which I got rather hung up on for a long time) I do think we need a way to inform users that if they see a very large number for Command Timeout, then they need to update their SMART database (drivedb.h). But I've digressed enough already.

Quote

March 8, 201610 yr

Rob,

As always, thanks for the very detailed and thorough write up. Will be discussing this internally after we get the 6.2 beta released.

Quote

June 10, 201610 yr

Author

Bump.

My apologies, I haven't installed any of the betas, so if some or all has been changed, I wasn't yet aware. I'll try to install ... soon.

Quote

June 10, 201610 yr

I'll try to install ... soon.

ROTFLOL! ....soon.

Amazing how "soon" covers a multitude of timeframes around here.

Quote

June 10, 201610 yr

Author

I'll try to install ... soon.

ROTFLOL! ....soon.

Amazing how "soon" covers a multitude of timeframes around here.

It's the perfect word when you can't - or won't - commit yourself.

Quote

SMART reporting, testing, and notifications

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)