codearoni Posted November 16, 2020 Share Posted November 16, 2020 (edited) Solution: If you own WD Red Plus drives, and you're experiencing Errors - perform the following steps: 1) Run an Extended SMART test on the drive that is error'ing. 2) Download the SMART results when it's complete and check Raw_Read_Error_Rate in the .txt file 3) Raw_Read_Error_Rate should be zero for WD Red Plus drives specifically (this statement is not true for all HDD's, ask for help if you're using a different type of drive)! 4) If the Raw_Read_Error_Rate is not zero, your Red Plus drive will need replacing. RMA it if under warranty. 5) For extra certainty, run an Extended SMART test on the replacement drive to ensure it's working as expected. 6) Add "1,200" (no quotes) to the Smart Attribute Notifications of your WD Red Plus drives (textbox next to "Custom Attributes") OP Below: Hi all! New Unraid user here. Everything has been working swimmingly up until my first mover job (cache dumping contents onto spinny plates). My disk1 is receiving a crazy mount of errors, screenshot: This system, including all the drives are brand new. I downloaded my diagnostics, and found thousands of these in the syslog.txt Nov 16 10:08:58 Alexandria kernel: md: disk1 read error, sector=15032479712 Nov 16 10:08:58 Alexandria kernel: md: disk1 read error, sector=15032479720 Nov 16 10:08:58 Alexandria kernel: md: disk1 read error, sector=15032479728 Nov 16 10:08:58 Alexandria kernel: md: disk1 read error, sector=15032479736 Nov 16 10:08:58 Alexandria kernel: md: disk1 read error, sector=15032479744 Nov 16 10:08:58 Alexandria kernel: md: disk1 read error, sector=15032479752 I'm currently running the SMART extended self-test on disk1. Results TBD. My question is: Is disk1 bunk? Given the fact that all the drives are fresh off the press, so to speak, I would expect zero errors. Could there be a software reason for all these errors, outside of a bad disk? Looking for help here before moving forward with an RMA. Cheers! Edited December 3, 2020 by codearoni Quote Link to comment
trurl Posted November 16, 2020 Share Posted November 16, 2020 Syslog snippets are seldom sufficient. Without more information, best guess is bad connection, simply based on most frequent problem we see. 7 minutes ago, codearoni said: downloaded my diagnostics Give them to us and we will have more information to understand what is happening and make recommendations. Attach complete Diagnostics ZIP file to your NEXT post in this thread. Quote Link to comment
codearoni Posted November 16, 2020 Author Share Posted November 16, 2020 Attaching. Thank you trurl! alexandria-diagnostics-20201116-1414.zip Quote Link to comment
trurl Posted November 16, 2020 Share Posted November 16, 2020 This one looks like it may be a disk problem: Nov 16 03:40:22 Alexandria kernel: ata2.00: status: { DRDY SENSE ERR } Nov 16 03:40:22 Alexandria kernel: ata2.00: error: { UNC } Nov 16 03:40:22 Alexandria kernel: ata2.00: configured for UDMA/133 Nov 16 03:40:22 Alexandria kernel: sd 2:0:0:0: [sdc] tag#4 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08 Nov 16 03:40:22 Alexandria kernel: sd 2:0:0:0: [sdc] tag#4 Sense Key : 0x3 [current] Nov 16 03:40:22 Alexandria kernel: sd 2:0:0:0: [sdc] tag#4 ASC=0x11 ASCQ=0x4 Nov 16 03:40:22 Alexandria kernel: sd 2:0:0:0: [sdc] tag#4 CDB: opcode=0x88 88 00 00 00 00 03 80 00 4c 18 00 00 05 40 00 00 Nov 16 03:40:22 Alexandria kernel: print_req_error: I/O error, dev sdc, sector 15032405016 Let us know how the extended SMART turns out. Quote Link to comment
codearoni Posted November 16, 2020 Author Share Posted November 16, 2020 Just posting an update: extended SMART is still at 40%. Might not have results ready until tomorrow. Thanks again for meandering this issue with me trurl Quote Link to comment
codearoni Posted November 17, 2020 Author Share Posted November 17, 2020 Attached is my smart report for disk1. The text below the report download says "Completed without error" alexandria-smart-20201117-0817.zip Quote Link to comment
trurl Posted November 17, 2020 Share Posted November 17, 2020 That WD Red disk went from zero to this on SMART attribute 1: ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate PO-R-- 086 086 016 - 162 I would replace it. Quote Link to comment
codearoni Posted November 17, 2020 Author Share Posted November 17, 2020 Roger. Thank you so much Trurl! Just for my own notes and knowledge: can you briefly describe what you're seeing. Would a healthy disc have "000" for all of those fields? Quote Link to comment
trurl Posted November 17, 2020 Share Posted November 17, 2020 1 minute ago, codearoni said: Just for my own notes and knowledge: can you briefly describe what you're seeing. Would a healthy disc have "000" for all of those fields? Different disk models interpret that attribute differently. For WD Red it should be zero. If you have any other disks of that model, you should click on it to get to its page and set Unraid to monitor that attribute. Quote Link to comment
codearoni Posted November 19, 2020 Author Share Posted November 19, 2020 Just an update: I spun down the array and removed the disk. It's currently in RMA. After I get the replacement I'll start a rebuild. When it's all said and done, I'll update the OP with my steps used to triage this issue. Hoping it'll help future WD Red owners. Quote Link to comment
codearoni Posted November 28, 2020 Author Share Posted November 28, 2020 Hi trurl! While I've been waiting on my RMA'd disk, I've been looking into setting up Unraid to monitor said attribute for my WD Red drives. I've looked at the wiki plus these forums, but am unsure how to add monitoring as discussed above. I assume I go to the disk page, and enter a custom attribute (screenshot of what I'm talking about attached)? Is this correct? What would the syntax for this custom attribute look like? Quote Link to comment
trurl Posted November 28, 2020 Share Posted November 28, 2020 Just as it says. Custom attributes (use comma to separate numbers) You want 1 and 200 so just put 1,200 in the blank and APPLY Quote Link to comment
codearoni Posted November 30, 2020 Author Share Posted November 30, 2020 Thanks trurl. Looks good now. I was making it more complicated than it needed to be. (i.e. "Attribute = 0" trying to match the checkboxes below). Final question: I'll be rebuilding the array soon. I am adding a 2nd parity drive and one more storage drive. Should I: 1) spin up the array with the replacement disk ONLY, and rebuild FIRST - followed by spinning down the array, and adding the new drives. or 2) spin up the array with the replacement disk, plus the new drives, and rebuild all together. Couldn't find any documentation on this particular scenario in the wiki. I would prefer to do #2 as I imagine it'll be faster, but am obviously interested in doing this correctly moreso than quickly. Quote Link to comment
trurl Posted November 30, 2020 Share Posted November 30, 2020 Assuming you mean the new data disk for a new slot you can do #2 with new config. If the new data disk isn't clear you will have to rebuild parity1 at the same time so no protection until done. Quote Link to comment
codearoni Posted November 30, 2020 Author Share Posted November 30, 2020 Thanks trurl. Just to be clear: I'll be moving from 1x Parity and 3x Data drives to 2x Parity and 4x Data drives. Sounds like adding a 2nd parity will require a rebuild on Parity #1...so I might be better off doing #1, just adding the replacement data disk and rebuilding the array. Then afterwards, spinning down the array, and adding Parity #2 and Data #4? Quote Link to comment
itimpi Posted November 30, 2020 Share Posted November 30, 2020 8 minutes ago, codearoni said:Then afterwards, spinning down the array, and adding Parity #2 and Data #4? I have a feeling that Unraid will not allow these to be done in one step as adding the extra data drive starts a clear operation and adding a parity drive starts a parity sync operation - and you cannot run both of these at the same time. Quote Link to comment
codearoni Posted November 30, 2020 Author Share Posted November 30, 2020 Right on, ty itimpi! So as a general rule of thumbs: adding multiple data drives at once = fine. Adding data + parity drives = not (do them as separate tasks). Makes sense. I'm just a new user and don't want to make any assumptions as to how unraid operates. Quote Link to comment
trurl Posted November 30, 2020 Share Posted November 30, 2020 I should have reviewed the thread since I overlooked the fact you were replacing a disk. Of course that has to be done separately and before any other changes. You can add data and parity drives at the same time (through new config), but you must replace / rebuild a disk separately. If the disk actually needs replacing due to problems then that should be done before anything else. Quote Link to comment
codearoni Posted December 1, 2020 Author Share Posted December 1, 2020 No worries trurl, I imagine you're managing thousands of threads on this board lol. I've begun a rebuild but it'll take 12 hours. Probably tomorrow I'll pop on and update the OP with a summary of steps taken for these particular drives (WD Red Plus). Quote Link to comment
codearoni Posted December 2, 2020 Author Share Posted December 2, 2020 Everything has been updating swimmingly, just taking a while given the drives I got (14 hours each). I had a question about extended SMART tests though: can I run them while the array is up and running? Will things like mover jobs be interrupted by extended SMART tests if I run them at night? Quote Link to comment
JorgeB Posted December 2, 2020 Share Posted December 2, 2020 17 minutes ago, codearoni said: can I run them while the array is up and running? Yes, they ruin on the background, but avoid heavy i/o or they will take much longer. 1 Quote Link to comment
codearoni Posted December 3, 2020 Author Share Posted December 3, 2020 Thanks to everyone for the help on this issue! I've updated my OP with my triage steps. Hopefully it'll help future WD Red users in the future. I've got my array back online. The rebuild process was incredibly easy. Hardest part of this whole thing was waiting on the RMA drive. It's only strengthened the idea that Unraid was the right choice for my NAS. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.