April 6, 201016 yr My four day old unraid server is now stuck on Starting... The drives won't mount and errors showing on the parity drive. I cannot do anything in the web interface except Refresh, which updates the numbers. How do I resolve this? screenshot of webpage: http://imgur.com/Vk6Hm.png syslog error, repeated over and over for hours Apr 6 21:25:36 Tower kernel: handle_stripe read error: 2193044176/0, count: 1 Apr 6 21:25:40 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Apr 6 21:25:40 Tower kernel: ata1.00: irq_stat 0x40000001 Apr 6 21:25:40 Tower kernel: ata1.00: failed command: READ DMA EXT Apr 6 21:25:40 Tower kernel: ata1.00: cmd 25/00:08:6f:52:b7/00:00:82:00:00/e0 tag 0 dma 4096 in Apr 6 21:25:40 Tower kernel: res 51/40:00:6f:52:b7/00:00:82:00:00/e0 Emask 0x9 (media error) Apr 6 21:25:40 Tower kernel: ata1.00: status: { DRDY ERR } Apr 6 21:25:40 Tower kernel: ata1.00: error: { UNC } Apr 6 21:25:40 Tower kernel: ata1.00: configured for UDMA/133 Apr 6 21:25:40 Tower kernel: ata1: EH complete Apr 6 21:25:42 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Apr 6 21:25:42 Tower kernel: ata1.00: irq_stat 0x40000001 Apr 6 21:25:42 Tower kernel: ata1.00: failed command: READ DMA EXT Apr 6 21:25:42 Tower kernel: ata1.00: cmd 25/00:08:6f:52:b7/00:00:82:00:00/e0 tag 0 dma 4096 in Apr 6 21:25:42 Tower kernel: res 51/40:00:6f:52:b7/00:00:82:00:00/e0 Emask 0x9 (media error) Apr 6 21:25:42 Tower kernel: ata1.00: status: { DRDY ERR } Apr 6 21:25:42 Tower kernel: ata1.00: error: { UNC } Apr 6 21:25:42 Tower kernel: ata1.00: configured for UDMA/133 Apr 6 21:25:42 Tower kernel: ata1: EH complete Apr 6 21:25:45 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Apr 6 21:25:45 Tower kernel: ata1.00: irq_stat 0x40000001 Apr 6 21:25:45 Tower kernel: ata1.00: failed command: READ DMA EXT Apr 6 21:25:45 Tower kernel: ata1.00: cmd 25/00:08:6f:52:b7/00:00:82:00:00/e0 tag 0 dma 4096 in Apr 6 21:25:45 Tower kernel: res 51/40:00:6f:52:b7/00:00:82:00:00/e0 Emask 0x9 (media error) Apr 6 21:25:45 Tower kernel: ata1.00: status: { DRDY ERR } Apr 6 21:25:45 Tower kernel: ata1.00: error: { UNC } Apr 6 21:25:45 Tower kernel: ata1.00: configured for UDMA/133 Apr 6 21:25:45 Tower kernel: ata1: EH complete Apr 6 21:25:47 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Apr 6 21:25:47 Tower kernel: ata1.00: irq_stat 0x40000001 Apr 6 21:25:47 Tower kernel: ata1.00: failed command: READ DMA EXT Apr 6 21:25:47 Tower kernel: ata1.00: cmd 25/00:08:6f:52:b7/00:00:82:00:00/e0 tag 0 dma 4096 in Apr 6 21:25:47 Tower kernel: res 51/40:00:6f:52:b7/00:00:82:00:00/e0 Emask 0x9 (media error) Apr 6 21:25:47 Tower kernel: ata1.00: status: { DRDY ERR } Apr 6 21:25:47 Tower kernel: ata1.00: error: { UNC } Apr 6 21:25:47 Tower kernel: ata1.00: configured for UDMA/133 Apr 6 21:25:47 Tower kernel: ata1: EH complete
April 6, 201016 yr One of your disks is reporting media errors. (it cannot read itself, and may have crashed, or it might be the SATA cable, or possibly even the disk controller, but most likely the disk itself.) It is the disk affiliated with ata1.00 We would need to see the entire syslog to know which disk it is. You can also try getting a smart report from each of your disks smartctl -a -d ata /dev/sdX where sdX is the device for each of your disks in turn. You are looking for Re-Allocated sectors in the report, or sectors pending re-allocation. Joe L.
April 6, 201016 yr Author The array has now come online. It automatically started a Parity check. I've attached both the full syslog and the smart report for the parity drive (sda). Parity-Check in progress. Cancel will stop the Parity-Check. Total size: 1,465,138,552 KB Current position: 174,384,180 (11.9%) Estimated speed: 19,809 KB/sec Estimated finish: 1085.7 minutes Sync errors: 338 syslog.zip smart.txt
April 6, 201016 yr The array has now come online. It automatically started a Parity check. I've attached both the full syslog and the smart report for the parity drive (sda). Parity-Check in progress. Cancel will stop the Parity-Check. Total size: 1,465,138,552 KB Current position: 174,384,180 (11.9%) Estimated speed: 19,809 KB/sec Estimated finish: 1085.7 minutes Sync errors: 338 Your parity disk is failing pretty badly with media errors. The smart report is showing 1643 sectors re-allocated, with 225 pending re-allocation. 5 Reallocated_Sector_Ct 0x0033 062 062 010 Pre-fail Always - 1643 197 Current_Pending_Sector 0x0012 095 095 000 Old_age Always - 225 Typically there are several thousand spare sectors on a large drive, but most unRAID users would consider this disk to have failed. The current "normalized" value in the smart report of 62 is getting close to the failure threshold of "10" but I'd consider the disk as failed now. I'd be looking for a replacement disk. It is not the time to wait for a sale. The parity check will attempt to read all the sectors on the parity disk, and it will "correct" by writing those that are reported as un-readable (allowing the re-allocation to take place). It has already corrected 338 errors, I'd expect a lot more. As I said, time for a replacement disk. Joe L.
April 6, 201016 yr Author I stopped the array, unassigned the drive on the devices page, and restarted the array. Currently I am running without parity. But I still have all the data on my other PC as well. I'll have a new drive in there in a few days. Thanks for the help.
April 6, 201016 yr I stopped the array, unassigned the drive on the devices page, and restarted the array. Currently I am running without parity. But I still have all the data on my other PC as well. I'll have a new drive in there in a few days. Thanks for the help. Did you preclear your disks before adding them to your server? It is strongly advised before putting your precious data on them. (It helps weed out those with early failures and with un-readable sectors) Find the preclear_disk.sh script here: http://lime-technology.com/forum/index.php?topic=2817.0 As long as your data is safe elsewhere, pre-clear the parity disk before you assign it to the array. Joe L.
Archived
This topic is now archived and is closed to further replies.