Brand new HDD went into "Error State" during a Unraid Data-Rebuild

JorgeB · July 18, 2023

If the emulated disk is you can try again, still have issues with the cache pool, don't forget that extended SMART test.

Nanuk_ · July 18, 2023

Score! You da man @itimpi

Though before the rebuild should I still run Preclear or only if it prompts me?

Edited July 18, 2023 by Nanuk_

Nanuk_ · July 18, 2023

Ah nevermind the moment I hit the last step you gave it started the data-rebuild automatically =D

I'm surprised it didn't require a Pre-clear first is that a 6.12 thing? In the previous version it required it.

Edited July 18, 2023 by Nanuk_

Nanuk_ · July 18, 2023

Darn I only ran the short SMART Test @JorgeB, it passed though!

Thanks so much for the help guys! I now know what to do in that situation and that really help! You both rock @JorgeB @itimpi!

itimpi · July 18, 2023

1 minute ago, Nanuk_ said:

it passed though!

Unfortunately the short test is nowhere near a through test of the drive.

The long test will live up to its name taking hours per TB. Also progress only updates at 10% intervals.

Nanuk_ · July 18, 2023

Damn should I stop the data rebuild and run the test?

JorgeB · July 18, 2023

You can do both at the same time, since that disk is not in the array.

Nanuk_ · July 18, 2023

Are you sure @JorgeB? Because it looks like the data-rebuild is writting data to it

JorgeB · July 18, 2023

3 hours ago, JorgeB said:

Also still errors on cache1, assuming cables were replaced run an extended SMART test and post the results.

Not that disk.

Nanuk_ · July 18, 2023

Ah! Thank you for clearing that up! Starting the Extended test now =D

Though my future goal is to retire this HBA and move the cache to two onboard nvmes since this HBA has bricked 3 SSDs and possibly a 10TB. But for some reason these 4 old 1TB are safe. That what I get for buying from the local amazon ripoff here in the Philippines. I've learned my lesson.

Though I plan a replacement in the future from "The Art of the Server" and just ship it here. At least it'll be form a trusted brand.

image.png.d48dd4e31aa0b3140f4f230232f00700.png

Edited July 19, 2023 by Nanuk_

Nanuk_ · July 19, 2023

Result of the long SMART test. Sorry shut down my desktop which the browser that had it on was.

image.png.9234d27d7a99fb235988567d7e1302bd.png

JorgeB · July 19, 2023

Did you replace the cables for that disk? It was still showing issues in the last diags

Nanuk_ · July 19, 2023

Yes, replaced I the cable. I've actually replaced them 3 times. But still the crc errors persist so I'm gonna assume it's the HBA which I plan to eventual replace. But since the only reputable HBA vendor I know is in the US and I'm in the PH I'll have to be patient and buy 2 NVMes as a replacement cache when I can afford it and retire this HBA.

Also the HDDs for the cache are really old 1TBs, only used them because the SDDs got bricked. They're pretty much on their last legs.

Edited July 19, 2023 by Nanuk_

Nanuk_ · July 22, 2023

New HDD is throwing errors again. I'm starting a SMART extended self-test, then I'll post the diag after. Not sure why it's throwing errors again. I really hope it's not a dud.

image.png.276df01a28a9438d0885ee46c543bcc0.png
image.png.fc6d82294ce1c30ba813933ce8ca419e.png

Nanuk_ · July 23, 2023

Here are the logs. I really hope that it's something normal or can be fixed, this was the replacement HDD for the one that had to be RMA'd

image.png.d0620ce3444cc210fa539f8f19967d24.png

trojancarabao-smart-20230723-0854.zip trojancarabao-diagnostics-20230723-1245.zip

Nanuk_ · July 23, 2023

Apparently my Fix Common Problems is now complaining about it.

Nanuk_ · July 23, 2023

Followed the Fix Common Problems advice and rebooted.

Nanuk_ · August 13, 2023

JorgeB · August 14, 2023

Are you rebuilding disk2?

Nanuk_ · August 14, 2023

Yes, I'm trying to follow the steps your all taught me
1.) first I replaced the SATA cable. (DONE)
2.) Afterwards I disabled spindown, (DONE)
3.) placed it into maintenance mode and (DONE)
4.) ran and extended smart test. (DONE)
5.) Then I ran an xfs repair (DONE)
6.) and now I'm currently rebuilding now. (IN PROGRESS)

trojancarabao-smart-20230814-1328.zip trojancarabao-diagnostics-20230814-1719.zip

Edited August 14, 2023 by Nanuk_

JorgeB · August 14, 2023

59 minutes ago, Nanuk_ said:

(IN PROGRESS)

Then that error (disk invalid) is normal, it will change to OK if/when it finishes rebuilding.

Nanuk_ · September 20, 2023

Hi so I bought 2 new HDDs I pre-cleared both, but and after a day of use one of them got an error. I have a feeling it might be because that drive is hooked up to my HBA which I think might be faulty. I set the server to maintenance mode and I'm running an extended test. I'll run XFS Repair after and send both the test and diag here when I'm done.

image.png.871ff277cac9430845b62e2c46305123.png

Nanuk_ · September 20, 2023

ATA Error Count: 44315 (device log contains only the most recent five errors)
   CR = Command Register [HEX]
   FR = Features Register [HEX]
   SC = Sector Count Register [HEX]
   SN = Sector Number Register [HEX]
   CL = Cylinder Low Register [HEX]
   CH = Cylinder High Register [HEX]
   DH = Device/Head Register [HEX]
   DC = Device Command Register [HEX]
   ER = Error register [HEX]
   ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 44315 occurred at disk power-on lifetime: 30 hours (1 days + 6 hours)
When the command that caused the error occurred, the device was active or idle.