Replacement for failing HGST 7K1000 drive

PeteBa · November 18, 2023

Hi, I have a small unRaid server made up of 5 fairly old 2.5" HGST 7K1000 1TB drives (4 x data and 1 x parity). I also have a 500GB nvme cache drive. This has worked great for over 5 years now as mainly a media server and backup storage. But one of the drives is reporting errors (SMART test attached) and so I have been looking for a replacement. Unfortunately, my searching has struggled to find a modern equivalent drive for a reasonable price.

I think it has to do with CMR vs SMR technology and that CMR is no longer very supported for what is now a small capacity drive. I think the WD Red Plus drive is roughly equivalent but seems very expensive (£100) compared to the various SMR drives (£30-40) that come up in an amazon search. So I guess I'm after some advice on a good 7K1000 replacement at a reasonable price.

My thoughts at the moment, include: 1) I dont need any additional storage capacity so going for larger drives doesnt seem necessary; 2) having said that, if 1TB drives are going obsolete do I bite the bullet and upgrade the whole system to two 4TB drives but that seems excessive for one failing drive; 3) ideally, I would replace the drive with an 1TB SSD that is about the same price point but I see very conflicting TRIM messages on the forums, and; 4) maybe, I'm unfairly concerned about SMR drives and given I have a reasonably sized cache nvme drive then it wont make any material difference.

Appreciate any thoughts/recommendations.

server-smart-20231118-1554.zip

itimpi · November 19, 2023

I do not see anything obvious in the SMART report that suggests a problem with the drive. There are a number of CRC errors but these relate to connection issues and are nearly always caused by the power and/or SATA cabling to the drive. You could try running an extended DMART test on the drive as a health check.

PeteBa · November 19, 2023

@itimpi Thanks for the response. I recall the CRC errors occurred many years ago due to a faulty sata cable that was replaced at that time. However, the drive has just started showing a "yellow thumbs down" on the Dashboard page and I'm getting daily e-mails quoting read error messages as below. I took a closer look at the SMART log attached above and it suggests the only recent error was a UNC error. Not sure what that is and whether it is terminal or not. I have run a few Extended SMART Test and they have all said PASSED. So this is a bit confusing 8(. Do you think I can just click acknowledge on the "thumbs down icon" and monitor ? Thanks again.

Most recent error from the SMART log:

Error 90 [1] occurred at disk power-on lifetime: 21475 hours (894 days + 19 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 04 f8 00 00 30 05 f9 28 00 00  Error: UNC at LBA = 0x3005f928 = 805697832

Recent notification emails from unRAID:

Event: Unraid array errors
Subject: Warning [SERVER] - array has errors
Description: Array has 1 disk with read errors
Importance: warning

Disk 2 - HGST_HTS721010A9E630_JG40006PG3RD0C (sdd) (errors 159)

Event: Unraid Parity-Check
Subject: Notice [SERVER] - Parity-Check finished (1 errors)
Description: Duration: 3 hours, 10 minutes, 47 seconds. Average speed: 87.4 MB/s
Importance: warning

Event: Unraid Status
Subject: Notice [SERVER] - array health report [FAIL]
Description: Array has 6 disks (including parity & pools)
Importance: warning

Parity - HGST_HTS721010A9E630_JR1020BN0J90DE (sdg) - active 21 C [OK]
Disk 1 - HGST_HTS721010A9E630_JR1000D31PL5WE (sdf) - active 26 C [OK]
Disk 2 - HGST_HTS721010A9E630_JG40006PG3RD0C (sdd) - active 26 C (disk has read errors) [NOK]
Disk 3 - HGST_HTS721010A9E630_JG40006PG6PP7C (sde) - active 27 C [OK]
Disk 4 - HGST_HTS721010A9E630_JS1020620JGW6W (sdc) - active 26 C [OK]
Cache - Samsung_SSD_960_EVO_250GB_S3ESNX1JB30291E (nvme0n1) - active 35 C [OK]

Parity is valid
Last checked on Sun 12 Nov 2023 03:10:48 AM GMT (yesterday), finding 1 error.
Duration: 3 hours, 10 minutes, 47 seconds. Average speed: 87.4 MB/s

JorgeB · November 20, 2023

If you didn't reboot yet please post the diagnostics.

PeteBa · November 20, 2023

@JorgeB, I haven't rebooted the server so hoping the attached diagnostics are useful? Many thanks.

server-diagnostics-20231120-0951.zip

JorgeB · November 20, 2023

It's logged as a disk problem, if it happens again, and also considering the power on hours, I would probably replace it.

Replacement for failing HGST 7K1000 drive

Recommended Posts

PeteBa

Link to comment

itimpi

Link to comment

PeteBa

Link to comment

JorgeB

Link to comment

PeteBa

Link to comment

JorgeB

Link to comment

Join the conversation