PeteBa Posted November 18, 2023 Share Posted November 18, 2023 Hi, I have a small unRaid server made up of 5 fairly old 2.5" HGST 7K1000 1TB drives (4 x data and 1 x parity). I also have a 500GB nvme cache drive. This has worked great for over 5 years now as mainly a media server and backup storage. But one of the drives is reporting errors (SMART test attached) and so I have been looking for a replacement. Unfortunately, my searching has struggled to find a modern equivalent drive for a reasonable price. I think it has to do with CMR vs SMR technology and that CMR is no longer very supported for what is now a small capacity drive. I think the WD Red Plus drive is roughly equivalent but seems very expensive (£100) compared to the various SMR drives (£30-40) that come up in an amazon search. So I guess I'm after some advice on a good 7K1000 replacement at a reasonable price. My thoughts at the moment, include: 1) I dont need any additional storage capacity so going for larger drives doesnt seem necessary; 2) having said that, if 1TB drives are going obsolete do I bite the bullet and upgrade the whole system to two 4TB drives but that seems excessive for one failing drive; 3) ideally, I would replace the drive with an 1TB SSD that is about the same price point but I see very conflicting TRIM messages on the forums, and; 4) maybe, I'm unfairly concerned about SMR drives and given I have a reasonably sized cache nvme drive then it wont make any material difference. Appreciate any thoughts/recommendations. server-smart-20231118-1554.zip Quote Link to comment
itimpi Posted November 19, 2023 Share Posted November 19, 2023 I do not see anything obvious in the SMART report that suggests a problem with the drive. There are a number of CRC errors but these relate to connection issues and are nearly always caused by the power and/or SATA cabling to the drive. You could try running an extended DMART test on the drive as a health check. Quote Link to comment
PeteBa Posted November 19, 2023 Author Share Posted November 19, 2023 @itimpi Thanks for the response. I recall the CRC errors occurred many years ago due to a faulty sata cable that was replaced at that time. However, the drive has just started showing a "yellow thumbs down" on the Dashboard page and I'm getting daily e-mails quoting read error messages as below. I took a closer look at the SMART log attached above and it suggests the only recent error was a UNC error. Not sure what that is and whether it is terminal or not. I have run a few Extended SMART Test and they have all said PASSED. So this is a bit confusing 8(. Do you think I can just click acknowledge on the "thumbs down icon" and monitor ? Thanks again. Most recent error from the SMART log: Error 90 [1] occurred at disk power-on lifetime: 21475 hours (894 days + 19 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER -- ST COUNT LBA_48 LH LM LL DV DC -- -- -- == -- == == == -- -- -- -- -- 40 -- 51 04 f8 00 00 30 05 f9 28 00 00 Error: UNC at LBA = 0x3005f928 = 805697832 Recent notification emails from unRAID: Event: Unraid array errors Subject: Warning [SERVER] - array has errors Description: Array has 1 disk with read errors Importance: warning Disk 2 - HGST_HTS721010A9E630_JG40006PG3RD0C (sdd) (errors 159) Event: Unraid Parity-Check Subject: Notice [SERVER] - Parity-Check finished (1 errors) Description: Duration: 3 hours, 10 minutes, 47 seconds. Average speed: 87.4 MB/s Importance: warning Event: Unraid Status Subject: Notice [SERVER] - array health report [FAIL] Description: Array has 6 disks (including parity & pools) Importance: warning Parity - HGST_HTS721010A9E630_JR1020BN0J90DE (sdg) - active 21 C [OK] Disk 1 - HGST_HTS721010A9E630_JR1000D31PL5WE (sdf) - active 26 C [OK] Disk 2 - HGST_HTS721010A9E630_JG40006PG3RD0C (sdd) - active 26 C (disk has read errors) [NOK] Disk 3 - HGST_HTS721010A9E630_JG40006PG6PP7C (sde) - active 27 C [OK] Disk 4 - HGST_HTS721010A9E630_JS1020620JGW6W (sdc) - active 26 C [OK] Cache - Samsung_SSD_960_EVO_250GB_S3ESNX1JB30291E (nvme0n1) - active 35 C [OK] Parity is valid Last checked on Sun 12 Nov 2023 03:10:48 AM GMT (yesterday), finding 1 error. Duration: 3 hours, 10 minutes, 47 seconds. Average speed: 87.4 MB/s Quote Link to comment
JorgeB Posted November 20, 2023 Share Posted November 20, 2023 If you didn't reboot yet please post the diagnostics. Quote Link to comment
PeteBa Posted November 20, 2023 Author Share Posted November 20, 2023 @JorgeB, I haven't rebooted the server so hoping the attached diagnostics are useful? Many thanks. server-diagnostics-20231120-0951.zip Quote Link to comment
Solution JorgeB Posted November 20, 2023 Solution Share Posted November 20, 2023 It's logged as a disk problem, if it happens again, and also considering the power on hours, I would probably replace it. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.