[Help] Is my disk DOA?


codearoni

Recommended Posts

Solution:

If you own WD Red Plus drives, and you're experiencing Errors - perform the following steps:

1) Run an Extended SMART test on the drive that is error'ing.

2) Download the SMART results when it's complete and check Raw_Read_Error_Rate in the .txt file

3) Raw_Read_Error_Rate should be zero for WD Red Plus drives specifically (this statement is not true for all HDD's, ask for help if you're using a different type of drive)!

4) If the Raw_Read_Error_Rate is not zero, your Red Plus drive will need replacing. RMA it if under warranty.

5) For extra certainty, run an Extended SMART test on the replacement drive to ensure it's working as expected.

6) Add "1,200" (no quotes) to the Smart Attribute Notifications of your WD Red Plus drives (textbox next to "Custom Attributes")

 

OP Below:
Hi all! New Unraid user here.

Everything has been working swimmingly up until my first mover job (cache dumping contents onto spinny plates).

 

My disk1 is receiving a crazy mount of errors, screenshot:

1216363019_ScreenShot2020-11-16at1_53_02PM.png.d41f33b311215c106b27851e2a59c396.png

 

This system, including all the drives are brand new. 

 

I downloaded my diagnostics, and found thousands of these in the syslog.txt

 

Nov 16 10:08:58 Alexandria kernel: md: disk1 read error, sector=15032479712
Nov 16 10:08:58 Alexandria kernel: md: disk1 read error, sector=15032479720
Nov 16 10:08:58 Alexandria kernel: md: disk1 read error, sector=15032479728
Nov 16 10:08:58 Alexandria kernel: md: disk1 read error, sector=15032479736
Nov 16 10:08:58 Alexandria kernel: md: disk1 read error, sector=15032479744
Nov 16 10:08:58 Alexandria kernel: md: disk1 read error, sector=15032479752
 

I'm currently running the SMART extended self-test on disk1. Results TBD.


My question is: Is disk1 bunk? Given the fact that all the drives are fresh off the press, so to speak, I would expect zero errors.

Could there be a software reason for all these errors, outside of a bad disk? Looking for help here before moving forward with an RMA. Cheers!

Edited by codearoni
Link to comment

Syslog snippets are seldom sufficient. Without more information, best guess is bad connection, simply based on most frequent problem we see.

7 minutes ago, codearoni said:

downloaded my diagnostics

Give them to us and we will have more information to understand what is happening and make recommendations.

 

Attach complete Diagnostics ZIP file to your NEXT post in this thread.

Link to comment

This one looks like it may be a disk problem:

Nov 16 03:40:22 Alexandria kernel: ata2.00: status: { DRDY SENSE ERR }
Nov 16 03:40:22 Alexandria kernel: ata2.00: error: { UNC }
Nov 16 03:40:22 Alexandria kernel: ata2.00: configured for UDMA/133
Nov 16 03:40:22 Alexandria kernel: sd 2:0:0:0: [sdc] tag#4 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Nov 16 03:40:22 Alexandria kernel: sd 2:0:0:0: [sdc] tag#4 Sense Key : 0x3 [current] 
Nov 16 03:40:22 Alexandria kernel: sd 2:0:0:0: [sdc] tag#4 ASC=0x11 ASCQ=0x4 
Nov 16 03:40:22 Alexandria kernel: sd 2:0:0:0: [sdc] tag#4 CDB: opcode=0x88 88 00 00 00 00 03 80 00 4c 18 00 00 05 40 00 00
Nov 16 03:40:22 Alexandria kernel: print_req_error: I/O error, dev sdc, sector 15032405016

Let us know how the extended SMART turns out.

Link to comment
1 minute ago, codearoni said:

Just for my own notes and knowledge: can you briefly describe what you're seeing. Would a healthy disc have "000" for all of those fields?

Different disk models interpret that attribute differently. For WD Red it should be zero. If you have any other disks of that model, you should click on it to get to its page and set Unraid to monitor that attribute.

Link to comment
  • 2 weeks later...

Hi trurl! While I've been waiting on my RMA'd disk, I've been looking into setting up Unraid to monitor said attribute for my WD Red drives.

I've looked at the wiki plus these forums, but am unsure how to add monitoring as discussed above. I assume I go to the disk page, and enter a custom attribute (screenshot of what I'm talking about attached)? Is this correct? What would the syntax for this custom attribute look like? 

Screen Shot 2020-11-28 at 3.41.17 PM.png

Link to comment

Thanks trurl. Looks good now. I was making it more complicated than it needed to be. (i.e. "Attribute = 0" trying to match the checkboxes below).

 

Final question: I'll be rebuilding the array soon. I am adding a 2nd parity drive and one more storage drive. 

Should I: 1) spin up the array with the replacement disk ONLY, and rebuild FIRST - followed by spinning down the array, and adding the new drives. 
or 2) spin up the array with the replacement disk, plus the new drives, and rebuild all together. 

Couldn't find any documentation on this particular scenario in the wiki. I would prefer to do #2 as I imagine it'll be faster, but am obviously interested in doing this correctly moreso than quickly.

Link to comment

Thanks trurl. Just to be clear: I'll be moving from 1x Parity and 3x Data drives to 2x Parity and 4x Data drives.
Sounds like adding a 2nd parity will require a rebuild on Parity #1...so I might be better off doing #1, just adding the replacement data disk and rebuilding the array. Then afterwards, spinning down the array, and adding Parity #2 and Data #4? 

Link to comment
8 minutes ago, codearoni said:Then afterwards, spinning down the array, and adding Parity #2 and Data #4? 

I have a feeling that Unraid will not allow these to be done in one step as adding the extra data drive starts a clear operation and adding a parity drive starts a parity sync operation - and you cannot run both of these at the same time.

Link to comment

I should have reviewed the thread since I overlooked the fact you were replacing a disk. Of course that has to be done separately and before any other changes.

 

You can add data and parity drives at the same time (through new config), but you must replace / rebuild a disk separately. If the disk actually needs replacing due to problems then that should be done before anything else.

Link to comment

Everything has been updating swimmingly, just taking a while given the drives I got (14 hours each). 

I had a question about extended SMART tests though: can I run them while the array is up and running?
Will things like mover jobs be interrupted by extended SMART tests if I run them at night? 

Link to comment

Thanks to everyone for the help on this issue!
I've updated my OP with my triage steps. Hopefully it'll help future WD Red users in the future.
I've got my array back online. The rebuild process was incredibly easy. Hardest part of this whole thing was waiting on the RMA drive. It's only strengthened the idea that Unraid was the right choice for my NAS. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.