Very long parity sync...

Robertxc · October 3, 2009

I've down complete fresh install of Unraid and having assigned all the drives I am now doing the parity sync. I has now been syncing for over 24 hours and it says it still has 1810 minutes to go!! Surely this is not normal? Also, all the drives except the parity are spun down, and judging by the temperature fo the parity drive, it doesn't seem to be doing much either! ny suggestions?

Syslog is here: http://pastebin.com/mf286b91

NLS · October 3, 2009

Is the "current position" moving?

If not, I would restart the whole thing.

Robertxc · October 3, 2009

Yes, it is moving...barely.

Robertxc · October 3, 2009

Would it help if i stopped the array? Would the parity sync continue?

purko · October 3, 2009

Hi Robertxc!

Some things in the syslog look suspicious to me...

Oct 2 15:19:37 Tower emhttp: shcmd (41): mkdir /mnt/disk2
Oct 2 15:19:37 Tower emhttp: shcmd (41): mkdir /mnt/disk1

Oct 2 15:19:37 Tower kernel: mdcmd (7): check

Oct 2 15:19:37 Tower kernel: md: recovery thread woken up ...

Oct 2 15:19:37 Tower kernel: md: recovery thread syncing parity disk ...

Oct 2 15:19:37 Tower emhttp: shcmd (42): mount -t reiserfs -o noacl,nouser_xattr,noatime,nodiratime /dev/md1 /mnt/disk1 >/dev/null 2>&1

Oct 2 15:19:37 Tower emhttp: shcmd (42): mkdir /mnt/disk3

The above shows that the system started the parity synch before it even mounted all the disks. I am not sure if it is normal or not.

Later in the syslog I notice that the system is putting the disks to sleep:

Oct 3 00:48:09 Tower emhttp: shcmd (63): /usr/sbin/hdparm -y /dev/sda >/dev/null
Oct 3 00:48:09 Tower emhttp: shcmd (64): /usr/sbin/hdparm -y /dev/sdd >/dev/null

Oct 3 00:48:09 Tower emhttp: shcmd (65): /usr/sbin/hdparm -y /dev/sdc >/dev/null

Oct 3 00:48:09 Tower emhttp: shcmd (66): /usr/sbin/hdparm -y /dev/sde >/dev/null

Oct 3 00:48:09 Tower emhttp: shcmd (67): /usr/sbin/hdparm -y /dev/sdb >/dev/null

Now this is not right, assuming that parity synch is running and it is reading all disks all the time. This is probably the buggy part.

On my system I have set up unraid NOT to put the disks to sleep. My reasoning is that if a disk can be put to sleep, then it can do it by itself, doesn't need a command when to do it. All it needs is to set up its inactivity sleep timer. So in my 'go' script I have the following line:

for i in /dev/[sh]d? ; do hdparm -S200 $i >/dev/null 2>&1 ; done

Also, I notice that your unraid is constantly trying to talk with some DHCP server:

Oct 2 23:17:33 Tower dhcpcd[1403]: DHCP_ACK received from (192.168.0.1)
Oct 2 23:47:33 Tower dhcpcd[1403]: sending DHCP_REQUEST for 192.168.0.190 to 192.168.0.1

Oct 2 23:47:34 Tower dhcpcd[1403]: dhcpIPaddrLeaseTime=3600 in DHCP server response.

Oct 2 23:47:34 Tower dhcpcd[1403]: dhcpT1value is missing in DHCP server response. Assuming 1800 sec

Oct 2 23:47:34 Tower dhcpcd[1403]: dhcpT2value is missing in DHCP server response. Assuming 3150 sec

Oct 2 23:47:34 Tower dhcpcd[1403]: DHCP_ACK received from (192.168.0.1)

Oct 3 00:17:34 Tower dhcpcd[1403]: sending DHCP_REQUEST for 192.168.0.190 to 192.168.0.1

Oct 3 00:17:35 Tower dhcpcd[1403]: dhcpIPaddrLeaseTime=3600 in DHCP server response.

Oct 3 00:17:35 Tower dhcpcd[1403]: dhcpT1value is missing in DHCP server response. Assuming 1800 sec

Oct 3 00:17:35 Tower dhcpcd[1403]: dhcpT2value is missing in DHCP server response. Assuming 3150 sec

Why do you need that? Just set up its IP address manually, and tell it not to use DHCP.

All this said, I may still be missing your real problem. Maybe somebody more knowlegeable can chime in.

Yours,

Purko

Robertxc · October 3, 2009

Thanks Purko. I think I might just reboot and start again. There's no particular reason why I set it to get its IP address automatically, I am running a DHCP server anyway, and all the other machines on my network use it. I could just as easily give it a fixed one. I didn't think it would cause any issues for Unraid.

prostuff1 · October 3, 2009

It usually works better if you set the server to get the same IP address on every boot. I do this by reserving the MAC address of the server in my router and tell it to assign the server a certain IP. In other words I leave the server on DHCP but tell the router that when it sees a certain MAC it needs to assign it this IP address.

Joe L. · October 3, 2009

IF all your disks are different sizes that the smaller will go to sleep once they are no longer being read as part of the parity calculations.

What will provide clues for people to give better assistance is for you to post a copy of your syslog... otherwise, we are just guessing.

All it would take is one disk to be mis0behaving to get the times you are seeing. And no, with a 2TB set of disks and a PCI only bus, it could take 24 hours or more.

Joe L.

Robertxc · October 3, 2009

IF all your disks are different sizes that the smaller will go to sleep once they are no longer being read as part of the parity calculations.

What will provide clues for people to give better assistance is for you to post a copy of your syslog... otherwise, we are just guessing.

All it would take is one disk to be mis0behaving to get the times you are seeing. And no, with a 2TB set of disks and a PCI only bus, it could take 24 hours or more.

Joe L.

Do you mean a syslog in addition to the one I linked to in my first post? I have since restarted the parity sync and (so far) it seems to be behaving normally. The most up to date (since restarting the parity sync) syslog is here: http://pastebin.com/m7dc5a899

RobJ · October 4, 2009

Your syslogs look pretty good, until it fails to finish the parity build, and that is probably going to remain a mystery. You have restarted, and another parity build has begun, but if it fails too, then I recommend that you Obtain a SMART report for your parity drive. The parity build proceeded without issue through all of the 750GB drives (at the 50% point), then spun them down, then finished the 4 1TB drives (at the 67% point) and spun them down, then had nothing left to do but zero out all remaining sectors on the parity drive. It did start that, but has only gotten to 73% and something has gone very wrong, but there are absolutely NO errors reported. There is possibly a problem within the drive at the high end, so perhaps the SMART report will show what is wrong. I have never seen a problem before with the parity build failing at this point, don't have any ideas other than a possible problem at the high end of the drive. Depending on what the SMART report reveals, you should probably try Preclearing the parity drive.

Concerning DHCP, you have changed it to a static IP, which I too recommend, but you still may want to check your DHCP server, wherever it is (probably on the router), as it looks mis-configured. It currently appears to require renewal every 30 minutes, which is a bit ridiculous. Although this is up to your personal preference, I would recommend setting it to at least one or 2 days, preferably one or 2 weeks. It no longer matters to your unRAID server, but still affects your other networked machines, that may be using it, but is only a minor issue.

One other possible issue, don't know if you want to mess with it, but Disk 6 (sde) is actually a 1TB drive, but has been configured as a 750GB drive. You can easily confirm by checking the model numbers on your Web Management screen, and the syslog confirms there is an HPA covering the last 250GB. This probably indicates a drive that was sold as a 750GB drive, when they may have had a surplus of 1TB drives, and they just configured it to look like a 750GB. If you use one of the tools to raise its capacity back to 1TB, you will have to test it thoroughly, to make sure there are not flaws out there that are the reason for 'shorting' it. And you would also have to rebuild parity again, if you change the drive's size (but you may have to rebuild parity again anyway).

Very long parity sync...

Recommended Posts

Robertxc

Link to comment

NLS

Link to comment

Robertxc

Link to comment

Robertxc

Link to comment

purko

Link to comment

Robertxc

Link to comment

prostuff1

Link to comment

Joe L.

Link to comment

Robertxc

Link to comment

RobJ

Link to comment

Join the conversation