[SOLVED] Errors showing up on two drives

January 4, 201313 yr

Hello All,

I'm getting some errors show up on two of my drives on the web interface and I'm not sure where to go from here.

I guess I've had these errors for a while but haven't noticed them until now. I have 6 drives in my array (parity + 5 for data). The errors are showing up on parity, and on disk5.

I attached a quick screenshot of my web interface main page, and also my full syslog. I had to put the syslog in a zip file otherwise it wouldn't fit, I hope that's OK. I'm running unRAID version 4.7. The short SMART tests are also attached for both drives.

It seems to have encountered a few of them when I did a parity check on Nov 12th:

Nov 12 11:36:42 nas0 kernel: md: recovery thread woken up ...
Nov 12 11:36:42 nas0 kernel: md: recovery thread checking parity...
Nov 12 11:36:42 nas0 kernel: md: using 1152k window, over a total of 1953514552 blocks.
Nov 12 11:37:00 nas0 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Nov 12 11:37:00 nas0 kernel: ata1.00: BMDMA stat 0x25
Nov 12 11:37:00 nas0 kernel: ata1.00: failed command: READ DMA EXT
Nov 12 11:37:00 nas0 kernel: ata1.00: cmd 25/00:00:40:f2:08/00:04:00:00:00/e0 tag 0 dma 524288 in
Nov 12 11:37:00 nas0 kernel:          res 51/40:7f:b8:f5:08/40:00:00:00:00/e0 Emask 0x9 (media error)
Nov 12 11:37:00 nas0 kernel: ata1.00: status: { DRDY ERR }
Nov 12 11:37:00 nas0 kernel: ata1.00: error: { UNC }
Nov 12 11:37:00 nas0 kernel: ata1.00: configured for UDMA/133
Nov 12 11:37:00 nas0 kernel: ata1: EH complete
Nov 12 11:37:03 nas0 kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Nov 12 11:37:03 nas0 kernel: ata1.00: BMDMA stat 0x25
Nov 12 11:37:03 nas0 kernel: ata1.00: failed command: READ DMA EXT
Nov 12 11:37:03 nas0 kernel: ata1.00: cmd 25/00:00:40:f2:08/00:04:00:00:00/e0 tag 0 dma 524288 in
Nov 12 11:37:03 nas0 kernel:          res 51/40:7f:b8:f5:08/40:00:00:00:00/e0 Emask 0x9 (media error)
Nov 12 11:37:03 nas0 kernel: ata1.00: status: { DRDY ERR }
Nov 12 11:37:03 nas0 kernel: ata1.00: error: { UNC }
Nov 12 11:37:03 nas0 kernel: ata1.00: configured for UDMA/133
Nov 12 11:37:03 nas0 kernel: ata1: EH complete

And then some more on Nov 27:

Nov 27 11:17:10 nas0 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors)
Nov 27 11:17:10 nas0 kernel:          res 51/40:08:c0:61:c7/40:00:d5:00:00/e0 Emask 0x9 (media error) (Errors)
Nov 27 11:17:10 nas0 kernel: ata3.00: error: { UNC } (Errors)

And then a whole bunch on Dec 20:

Dec 20 16:33:08 nas0 apcupsd[1706]: Power failure.
Dec 20 16:33:10 nas0 apcupsd[1706]: Power is back. UPS running on mains.
Dec 20 17:22:58 nas0 mountd[30357]: authenticated mount request from 192.168.210.40:52483 for /mnt/user/media (/mnt/user/media)
Dec 20 17:26:45 nas0 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 20 17:26:45 nas0 kernel: ata3.00: BMDMA stat 0x25
Dec 20 17:26:45 nas0 kernel: ata3.00: failed command: READ DMA EXT
Dec 20 17:26:45 nas0 kernel: ata3.00: cmd 25/00:00:a0:4e:ff/00:04:d5:00:00/e0 tag 0 dma 524288 in
Dec 20 17:26:45 nas0 kernel:          res 51/40:bf:d8:51:ff/40:00:d5:00:00/e0 Emask 0x9 (media error)
Dec 20 17:26:45 nas0 kernel: ata3.00: status: { DRDY ERR }
Dec 20 17:26:45 nas0 kernel: ata3.00: error: { UNC }
Dec 20 17:26:45 nas0 kernel: ata3.00: configured for UDMA/133
Dec 20 17:26:45 nas0 kernel: ata3: EH complete
Dec 20 17:26:54 nas0 kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 20 17:26:54 nas0 kernel: ata3.00: BMDMA stat 0x25
Dec 20 17:26:54 nas0 kernel: ata3.00: failed command: READ DMA EXT
Dec 20 17:26:54 nas0 kernel: ata3.00: cmd 25/00:00:38:24:01/00:04:d6:00:00/e0 tag 0 dma 524288 in
Dec 20 17:26:54 nas0 kernel:          res 51/40:5f:d8:25:01/40:02:d6:00:00/e0 Emask 0x9 (media error)
Dec 20 17:26:54 nas0 kernel: ata3.00: status: { DRDY ERR }
Dec 20 17:26:54 nas0 kernel: ata3.00: error: { UNC }
Dec 20 17:26:54 nas0 kernel: ata3.00: configured for UDMA/133
Dec 20 17:26:54 nas0 kernel: ata3: EH complete
Dec 20 17:31:11 nas0 apcupsd[1706]: Power failure.
Dec 20 17:31:13 nas0 apcupsd[1706]: Power is back. UPS running on mains.
.
(more errors omitted -- see full syslog)
.

I seem to have had some power issues on Dec 20, not sure if that had anything to do with it... I have the system connected to an APC battery backup, and the system stayed running for the entire duration.

If anyone is available to help me on what my next step should be, I'd greatly appreciate it.

System specs:

CPU: AMD Athlon64 4000+

Motherboard: MSI K8N Neo4 Platinum

Power Supply: Ultra X4 750W

RAM: 2 GB DDR

Storage: Parity - Western Digital 2TB SATA

Storage: Disk1 - Western Digital 2TB SATA

Storage: Disk2 - Western Digital 750GB SATA

Storage: Disk3 - Western Digital 1TB SATA

Storage: Disk4 - Seagate 1.5 TB SATA

Storage: Disk5 - Western Digital 2TB SATA

syslog-2013-01-04.zip

SMART-parity.txt

SMART-disk5.txt

Quote

January 4, 201313 yr

Specs of Unraid server ?

Have you done a SMART test on those 2 drives?

Post the SMART tests so people can check that also

I'm New to Unraid but these are the steps I would do..

Quote

January 4, 201313 yr

Author

Specs of Unraid server ?

Have you done a SMART test on those 2 drives?

Post the SMART tests so people can check that also

I'm New to Unraid but these are the steps I would do..

Thanks -- I modified the original post to include that information. System specs are at the bottom and SMART test reports are attached as .txt files.

Quote

January 4, 201313 yr

Start replacement on both disks...

First thing you should do is check if the error rates increase, if they remain stable that can still mean the disks are fine, in my experience this means that replacements are in order..

Quote

January 5, 201313 yr

Both drive have unreadable sectors, indicated by the non-zero current_pending_sector counts, and need to be rebuilt. Can you copy all of the data off of disk 5?

Quote

January 5, 201313 yr

Author

Both drive have unreadable sectors, indicated by the non-zero current_pending_sector counts, and need to be rebuilt. Can you copy all of the data off of disk 5?

That's exactly what I'm doing right now. Getting everything moved off of it onto the other drives, which luckily have plenty of free space to accommodate what I've got on disk 5.

What would the process be after that's done?

1) I'd have to get disk5 either removed from the array or replaced with a new undamaged drive.

2) Either way, I'd have to rebuild parity, is that right? Which means I'd need to get the parity drive replaced beforehand.

Quote

January 5, 201313 yr

The drives are probably ok. They should be pre-cleared and then the SMART reports checked for any remaining pending sectors. If sectors are still pending after a pre-clear then they should be retired or RMAed. If there are no pending sectors and the overall status is PASSED then use the drives.

Quote

January 5, 201313 yr

Author

Thanks for the replies, I appreciate it!

What should be my first step once I get all of my data moved off of disk5? Something like stopping the array and removing those two drives from the configuration so I can do the pre-clears?

-----------------------------------------------

EDIT

OK so here's the situation and here's what I think I want to do about it.

-) I've got a new drive on order (2 TB) that should be here on Tuesday. I'd like to use this drive to replace my current parity drive.

-) I'm currently making sure that I have everything off of my disk5 data disk. What I'd like to do with this, for now, is to just remove it from the array (with no immediate need to re-add it as I've got plenty of room on my other 4 data drives for now)

-) After I get the parity drive swapped, and the disk5 drive removed, I can start pre-clears on them (either on the same system or on an alternate one) so that I can determine if I need to RMA them. Luckily they're both still under warranty (whew!)

The big question is, what order do I do everything in? It's almost like I want to just start a new array, using existing data on my 4 data drives. I'll just effectively be building a brand new parity disk based on those 4.

I don't know if this is correct, so please correct any of these steps for me or let me know what I should do instead.

1) Stop the array, remove parity and disk5 from the configuration ?

2) Shut system down, put new parity drive in, start system back up. I'm thinking since the configuration is missing parity, the array won't auto-start ?

3) Run pre-clear on new parity drive ?

3) Do an initconfig to wipe the array configuration (after backing up configs, noting all of my configuration settings, drive mappings, etc.) ?

4) Add new parity drive into the array in the parity slot ?

5) Start the array so it can now re-build parity on the brand new drive ?

Again, I have question marks behind all of those steps because I'm looking for some to either confirm or correct me on what you think I should do.

Any help is greatly appreciated!

- Ryan

Quote

January 5, 201313 yr

Yes to all.

Quote

January 6, 201313 yr

One last step after you have successfully completed your list. You should do a non-correcting parity check to verify that parity was written correctly and is completely readable. A parity check is extra insurance that everything is working correctly, and goes through mostly the same mechanics needed to rebuild 1 failed drive, so if the check completes with zero errors, you should be good to go.

Quote

January 7, 201313 yr

Author

One last step after you have successfully completed your list. You should do a non-correcting parity check to verify that parity was written correctly and is completely readable. A parity check is extra insurance that everything is working correctly, and goes through mostly the same mechanics needed to rebuild 1 failed drive, so if the check completes with zero errors, you should be good to go.

Thanks, I'll add that to my list.

I usually try to do one of those once every month, but times makes fools of us all! I may set that up to be automatic after I get everything done.

Quote

January 10, 201313 yr

Author

Thanks for all of your help guys.

I got the two bad drives removed from the system, my new replacement parity drive passed pre-clear, and I'm currently rebuilding parity.

Thanks again!

Quote

[SOLVED] Errors showing up on two drives

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)