January 4, 201313 yr Hello All, I'm getting some errors show up on two of my drives on the web interface and I'm not sure where to go from here. I guess I've had these errors for a while but haven't noticed them until now. I have 6 drives in my array (parity + 5 for data). The errors are showing up on parity, and on disk5. I attached a quick screenshot of my web interface main page, and also my full syslog. I had to put the syslog in a zip file otherwise it wouldn't fit, I hope that's OK. I'm running unRAID version 4.7. The short SMART tests are also attached for both drives. It seems to have encountered a few of them when I did a parity check on Nov 12th: Nov 12 11:36:42 nas0 kernel: md: recovery thread woken up ... Nov 12 11:36:42 nas0 kernel: md: recovery thread checking parity... Nov 12 11:36:42 nas0 kernel: md: using 1152k window, over a total of 1953514552 blocks. Nov 12 11:37:00 nas0 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Nov 12 11:37:00 nas0 kernel: ata1.00: BMDMA stat 0x25 Nov 12 11:37:00 nas0 kernel: ata1.00: failed command: READ DMA EXT Nov 12 11:37:00 nas0 kernel: ata1.00: cmd 25/00:00:40:f2:08/00:04:00:00:00/e0 tag 0 dma 524288 in Nov 12 11:37:00 nas0 kernel: res 51/40:7f:b8:f5:08/40:00:00:00:00/e0 Emask 0x9 (media error) Nov 12 11:37:00 nas0 kernel: ata1.00: status: { DRDY ERR } Nov 12 11:37:00 nas0 kernel: ata1.00: error: { UNC } Nov 12 11:37:00 nas0 kernel: ata1.00: configured for UDMA/133 Nov 12 11:37:00 nas0 kernel: ata1: EH complete Nov 12 11:37:03 nas0 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Nov 12 11:37:03 nas0 kernel: ata1.00: BMDMA stat 0x25 Nov 12 11:37:03 nas0 kernel: ata1.00: failed command: READ DMA EXT Nov 12 11:37:03 nas0 kernel: ata1.00: cmd 25/00:00:40:f2:08/00:04:00:00:00/e0 tag 0 dma 524288 in Nov 12 11:37:03 nas0 kernel: res 51/40:7f:b8:f5:08/40:00:00:00:00/e0 Emask 0x9 (media error) Nov 12 11:37:03 nas0 kernel: ata1.00: status: { DRDY ERR } Nov 12 11:37:03 nas0 kernel: ata1.00: error: { UNC } Nov 12 11:37:03 nas0 kernel: ata1.00: configured for UDMA/133 Nov 12 11:37:03 nas0 kernel: ata1: EH complete And then some more on Nov 27: Nov 27 11:17:10 nas0 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors) Nov 27 11:17:10 nas0 kernel: res 51/40:08:c0:61:c7/40:00:d5:00:00/e0 Emask 0x9 (media error) (Errors) Nov 27 11:17:10 nas0 kernel: ata3.00: error: { UNC } (Errors) And then a whole bunch on Dec 20: Dec 20 16:33:08 nas0 apcupsd[1706]: Power failure. Dec 20 16:33:10 nas0 apcupsd[1706]: Power is back. UPS running on mains. Dec 20 17:22:58 nas0 mountd[30357]: authenticated mount request from 192.168.210.40:52483 for /mnt/user/media (/mnt/user/media) Dec 20 17:26:45 nas0 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Dec 20 17:26:45 nas0 kernel: ata3.00: BMDMA stat 0x25 Dec 20 17:26:45 nas0 kernel: ata3.00: failed command: READ DMA EXT Dec 20 17:26:45 nas0 kernel: ata3.00: cmd 25/00:00:a0:4e:ff/00:04:d5:00:00/e0 tag 0 dma 524288 in Dec 20 17:26:45 nas0 kernel: res 51/40:bf:d8:51:ff/40:00:d5:00:00/e0 Emask 0x9 (media error) Dec 20 17:26:45 nas0 kernel: ata3.00: status: { DRDY ERR } Dec 20 17:26:45 nas0 kernel: ata3.00: error: { UNC } Dec 20 17:26:45 nas0 kernel: ata3.00: configured for UDMA/133 Dec 20 17:26:45 nas0 kernel: ata3: EH complete Dec 20 17:26:54 nas0 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Dec 20 17:26:54 nas0 kernel: ata3.00: BMDMA stat 0x25 Dec 20 17:26:54 nas0 kernel: ata3.00: failed command: READ DMA EXT Dec 20 17:26:54 nas0 kernel: ata3.00: cmd 25/00:00:38:24:01/00:04:d6:00:00/e0 tag 0 dma 524288 in Dec 20 17:26:54 nas0 kernel: res 51/40:5f:d8:25:01/40:02:d6:00:00/e0 Emask 0x9 (media error) Dec 20 17:26:54 nas0 kernel: ata3.00: status: { DRDY ERR } Dec 20 17:26:54 nas0 kernel: ata3.00: error: { UNC } Dec 20 17:26:54 nas0 kernel: ata3.00: configured for UDMA/133 Dec 20 17:26:54 nas0 kernel: ata3: EH complete Dec 20 17:31:11 nas0 apcupsd[1706]: Power failure. Dec 20 17:31:13 nas0 apcupsd[1706]: Power is back. UPS running on mains. . (more errors omitted -- see full syslog) . I seem to have had some power issues on Dec 20, not sure if that had anything to do with it... I have the system connected to an APC battery backup, and the system stayed running for the entire duration. If anyone is available to help me on what my next step should be, I'd greatly appreciate it. System specs: CPU: AMD Athlon64 4000+ Motherboard: MSI K8N Neo4 Platinum Power Supply: Ultra X4 750W RAM: 2 GB DDR Storage: Parity - Western Digital 2TB SATA Storage: Disk1 - Western Digital 2TB SATA Storage: Disk2 - Western Digital 750GB SATA Storage: Disk3 - Western Digital 1TB SATA Storage: Disk4 - Seagate 1.5 TB SATA Storage: Disk5 - Western Digital 2TB SATA syslog-2013-01-04.zip SMART-parity.txt SMART-disk5.txt
January 4, 201313 yr Specs of Unraid server ? Have you done a SMART test on those 2 drives? Post the SMART tests so people can check that also I'm New to Unraid but these are the steps I would do..
January 4, 201313 yr Author Specs of Unraid server ? Have you done a SMART test on those 2 drives? Post the SMART tests so people can check that also I'm New to Unraid but these are the steps I would do.. Thanks -- I modified the original post to include that information. System specs are at the bottom and SMART test reports are attached as .txt files.
January 4, 201313 yr Start replacement on both disks... First thing you should do is check if the error rates increase, if they remain stable that can still mean the disks are fine, in my experience this means that replacements are in order..
January 5, 201313 yr Both drive have unreadable sectors, indicated by the non-zero current_pending_sector counts, and need to be rebuilt. Can you copy all of the data off of disk 5?
January 5, 201313 yr Author Both drive have unreadable sectors, indicated by the non-zero current_pending_sector counts, and need to be rebuilt. Can you copy all of the data off of disk 5? That's exactly what I'm doing right now. Getting everything moved off of it onto the other drives, which luckily have plenty of free space to accommodate what I've got on disk 5. What would the process be after that's done? 1) I'd have to get disk5 either removed from the array or replaced with a new undamaged drive. 2) Either way, I'd have to rebuild parity, is that right? Which means I'd need to get the parity drive replaced beforehand.
January 5, 201313 yr The drives are probably ok. They should be pre-cleared and then the SMART reports checked for any remaining pending sectors. If sectors are still pending after a pre-clear then they should be retired or RMAed. If there are no pending sectors and the overall status is PASSED then use the drives.
January 5, 201313 yr Author Thanks for the replies, I appreciate it! What should be my first step once I get all of my data moved off of disk5? Something like stopping the array and removing those two drives from the configuration so I can do the pre-clears? ----------------------------------------------- EDIT OK so here's the situation and here's what I think I want to do about it. -) I've got a new drive on order (2 TB) that should be here on Tuesday. I'd like to use this drive to replace my current parity drive. -) I'm currently making sure that I have everything off of my disk5 data disk. What I'd like to do with this, for now, is to just remove it from the array (with no immediate need to re-add it as I've got plenty of room on my other 4 data drives for now) -) After I get the parity drive swapped, and the disk5 drive removed, I can start pre-clears on them (either on the same system or on an alternate one) so that I can determine if I need to RMA them. Luckily they're both still under warranty (whew!) The big question is, what order do I do everything in? It's almost like I want to just start a new array, using existing data on my 4 data drives. I'll just effectively be building a brand new parity disk based on those 4. I don't know if this is correct, so please correct any of these steps for me or let me know what I should do instead. 1) Stop the array, remove parity and disk5 from the configuration ? 2) Shut system down, put new parity drive in, start system back up. I'm thinking since the configuration is missing parity, the array won't auto-start ? 3) Run pre-clear on new parity drive ? 3) Do an initconfig to wipe the array configuration (after backing up configs, noting all of my configuration settings, drive mappings, etc.) ? 4) Add new parity drive into the array in the parity slot ? 5) Start the array so it can now re-build parity on the brand new drive ? Again, I have question marks behind all of those steps because I'm looking for some to either confirm or correct me on what you think I should do. Any help is greatly appreciated! - Ryan
January 6, 201313 yr One last step after you have successfully completed your list. You should do a non-correcting parity check to verify that parity was written correctly and is completely readable. A parity check is extra insurance that everything is working correctly, and goes through mostly the same mechanics needed to rebuild 1 failed drive, so if the check completes with zero errors, you should be good to go.
January 7, 201313 yr Author One last step after you have successfully completed your list. You should do a non-correcting parity check to verify that parity was written correctly and is completely readable. A parity check is extra insurance that everything is working correctly, and goes through mostly the same mechanics needed to rebuild 1 failed drive, so if the check completes with zero errors, you should be good to go. Thanks, I'll add that to my list. I usually try to do one of those once every month, but times makes fools of us all! I may set that up to be automatic after I get everything done.
January 10, 201313 yr Author Thanks for all of your help guys. I got the two bad drives removed from the system, my new replacement parity drive passed pre-clear, and I'm currently rebuilding parity. Thanks again!
Archived
This topic is now archived and is closed to further replies.