Smart failure

unburt · March 20, 2021

Hi, could someone help me interpret this smart failure? Is this a "replace the disk right now" problem or more like an early warning signal?

On the advice gleaned from reading other help topics, I've run a short smart test (no errors) and a long smart test (error occurred). I've attached the smart log.

I've recently retrieved an older (circa ~2017) 8TB Seagate external drive from my parents'. It has sat unused some time I think. I've shucked it and put it into my array. When I started the array it cleared the drive without complaint and then I formatted it with xfs, again without complaint. I have just started trying to use it (by copying some files on the command line from /mnt/disk1 to this disk) and Unraid has reported an error and disabled the disk.

I have also just started using a (new to me) Fujitsu D2607-A21 flavor LSI 9211 HBA card with IT firmeware purchased from a Hong Kong ebay seller. Perhaps that is related, the card hasn't proven itself yet.

From my understanding, these two errors are what Unraid is warning me about but I do not have the overall knowledge to understand how serious it is.

ATA Error Count: 2
   CR = Command Register [HEX]
   FR = Features Register [HEX]
   SC = Sector Count Register [HEX]
   SN = Sector Number Register [HEX]
   CL = Cylinder Low Register [HEX]
   CH = Cylinder High Register [HEX]
   DH = Device/Head Register [HEX]
   DC = Device Command Register [HEX]
   ER = Error register [HEX]
   ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2 occurred at disk power-on lifetime: 4639 hours (193 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: WP at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 00 00 ff ff ff 4f 00 23:37:42.990 WRITE FPDMA QUEUED
61 00 00 ff ff ff 4f 00 23:37:42.979 WRITE FPDMA QUEUED
60 00 e8 ff ff ff 4f 00 23:37:42.964 READ FPDMA QUEUED
60 00 00 ff ff ff 4f 00 23:37:42.964 READ FPDMA QUEUED
60 00 00 ff ff ff 4f 00 23:37:42.963 READ FPDMA QUEUED

Error 1 occurred at disk power-on lifetime: 4639 hours (193 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: WP at LBA = 0x0fffffff = 268435455

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
61 00 00 ff ff ff 4f 00 23:37:37.612 WRITE FPDMA QUEUED
61 00 00 ff ff ff 4f 00 23:37:37.608 WRITE FPDMA QUEUED
61 00 00 ff ff ff 4f 00 23:37:37.604 WRITE FPDMA QUEUED
60 00 60 ff ff ff 4f 00 23:37:37.591 READ FPDMA QUEUED
60 00 00 ff ff ff 4f 00 23:37:37.591 READ FPDMA QUEUED

ST8000DM004-2CX188_WG800K2R-20210320-1529.txt

tower-diagnostics-20210320-1702.zip

Edited March 21, 2021 by unburt

John_M · March 20, 2021

It is showing read errors and has failed a self-test. I'd replace it ASAP.

G Speed · March 21, 2021

I'm getting same errors.. 8tb Samsung

If I pull and check drive smart is perfect..

Is it connected to a raid card or to mobo ports?

6.9?

JorgeB · March 21, 2021

3 hours ago, G Speed said:

8tb Samsung

There are no 8TB Samsung drives, you can post your diagnostics if you want more informed advice.

G Speed · March 21, 2021

3 hours ago, JorgeB said:

There are no 8TB Samsung drives, you can post your diagnostics if you want more informed advice.

Sorry it was 2am lol, I meant Seagate lol

unburt · March 21, 2021

Hmm mine is attached to an Fujitsu 9211 raid card (IT mode firmware) I just purchased. Perhaps that has something to do with it, since it hasn't really proven itself yet. I've only just started using it. The card or its cables could be bad.

JorgeB · March 21, 2021

SMART test failures can't be cable related.

unburt · March 21, 2021

13 minutes ago, JorgeB said:

SMART test failures can't be cable related.

Oh right. Good point.

John_M · March 21, 2021

18 hours ago, unburt said:

8TB Seagate external drive

The big problem I have with those Barracuda Compute drives is the fact that they're designed only for very light workloads. Apart from the fact that they run quite warm in their fanless plastic boxes, their use case as only occasionally used backup drives is almost ideal. But if you shuck them and install them in a server that's powered 24/7 you're really operating them outside of their design envelope. The fact that they use SMR recording technology[1] is not a major concern for me but the fact than even a monthly parity check will exceed their Workload Rate Limit[2] of 55 TB/year,[3] by a considerable margin, is. They are not even intended to be powered continuously (2400 hours/year[3], or a duty cycle of approximately 27%), though I would accept that being powered on but spun down is not as wearing as spinning continuously, especially as they are likely to stay much cooler than inside their plastic enclosures.

References:

[1] https://www.seagate.com/gb/en/internal-hard-drives/cmr-smr-list/

[2] https://www.seagate.com/gb/en/support/kb/annualized-workload-rate-005902en/

[3] https://www.seagate.com/www-content/datasheets/pdfs/3-5-barracudaDS1900-10-1802US-en_US.pdf

unburt · March 21, 2021

You make a great point about the workloads that can be expected from the drive that I have. I promise I endeavoured to use it in a very light-duty way (probably contravening some other NAS/unraid principles though). I intended to exclude it from other shares and to write 8TB of seldom accessed files to it now and never touch it again. Basically, I could have kept it as an external drive but it seemed "cleaner" to me to get it inside my PC case rather than having the enclosure sit on a nearby shelf.

Perhaps having it as a mounted unassigned disk would be an improvement so that it would not be included in monthly parity checks.

ps. I really appreciate your citations. With your comments + reading the cited articles, it was much easier for me to understand them.

Edited March 22, 2021 by unburt

Smart failure

Recommended Posts

unburt

Link to comment

John_M

Link to comment

G Speed

Link to comment

JorgeB

Link to comment

G Speed

Link to comment

unburt

Link to comment

JorgeB

Link to comment

unburt

Link to comment

John_M

Link to comment

unburt

Link to comment

Join the conversation