lloydsmart Posted December 6, 2011 Posted December 6, 2011 Hi, As the title suggests, I'm having some trouble with the parity sync. Basically, the situation is, I upgraded recently from 5.0b12a (which was working perfectly) to 5.0b14, and stupidly I formatted the flash drive instead of just over-writing files with new ones. This, of course, meant all my settings were gone, and when I booted up Unraid wanted to do a new parity sync. I didn't realize this was happening until it was too late, and now I'm scared to stop it in case it really will leave the array un-protected. I have a backup of my original b12a usb stick if needed. The problem I'm having is that the parity sync is running at about 512 Kb/s. At this rate, it will take over a month to complete for me. I have 7 data drives of various sizes, a 3TB parity drive, and a 150GB cache drive (10k RPM, SATA-6). Nothing is on PCI, which seems to always be the cause of similar problems to this on these forums, so I'm stumped. What can I do? I thought about stopping the sync, reformatting the usb stick, and restoring my b12a installation, but now I'm worried about the parity. What effect will half a re-calculated parity sync have on the reliability of the array? I guess in theory it's just overwriting the drive with the same data again, but how can I be sure? Any help greatly appreciated - weather it's just "Yeah, it's safe to go back to b12a" or "this is how to speed up the sync". I tried to attach a syslog, but it's just a touch too big at 196k. The limit on this board is 192k. So here's a link to it on my dropbox: http://db.tt/aA9uinKn Thanks in advance!
Joe L. Posted December 6, 2011 Posted December 6, 2011 Hi, As the title suggests, I'm having some trouble with the parity sync. Basically, the situation is, I upgraded recently from 5.0b12a (which was working perfectly) to 5.0b14, and stupidly I formatted the flash drive instead of just over-writing files with new ones. This, of course, meant all my settings were gone, and when I booted up Unraid wanted to do a new parity sync. I didn't realize this was happening until it was too late, and now I'm scared to stop it in case it really will leave the array un-protected. I have a backup of my original b12a usb stick if needed. The problem I'm having is that the parity sync is running at about 512 Kb/s. At this rate, it will take over a month to complete for me. I have 7 data drives of various sizes, a 3TB parity drive, and a 150GB cache drive (10k RPM, SATA-6). Nothing is on PCI, which seems to always be the cause of similar problems to this on these forums, so I'm stumped. What can I do? I thought about stopping the sync, reformatting the usb stick, and restoring my b12a installation, but now I'm worried about the parity. What effect will half a re-calculated parity sync have on the reliability of the array? I guess in theory it's just overwriting the drive with the same data again, but how can I be sure? Any help greatly appreciated - weather it's just "Yeah, it's safe to go back to b12a" or "this is how to speed up the sync". I tried to attach a syslog, but it's just a touch too big at 196k. The limit on this board is 192k. So here's a link to it on my dropbox: http://db.tt/aA9uinKn Thanks in advance! You are in pretty deep. Did you even look at the syslog? You have a failing disk2. Dec 6 06:58:49 Tower kernel: handle_stripe read error: 359128344/2, count: 1 Dec 6 06:58:49 Tower kernel: md: disk2 read error Dec 6 06:58:49 Tower kernel: handle_stripe read error: 359128352/2, count: 1 Dec 6 06:58:49 Tower kernel: md: disk2 read error Dec 6 06:58:49 Tower kernel: handle_stripe read error: 359128360/2, count: 1 Dec 6 06:58:49 Tower kernel: md: disk2 read error Dec 6 06:58:49 Tower kernel: handle_stripe read error: 359128368/2, count: 1 Dec 6 06:58:49 Tower kernel: md: disk2 read error Dec 6 06:58:49 Tower kernel: handle_stripe read error: 359128376/2, count: 1 Dec 6 06:58:49 Tower kernel: md: disk2 read error Dec 6 06:58:49 Tower kernel: handle_stripe read error: 359128384/2, count: 1 Dec 6 06:58:49 Tower kernel: md: disk2 read error Dec 6 06:58:49 Tower kernel: handle_stripe read error: 359128392/2, count: 1 Dec 6 06:58:49 Tower kernel: md: disk2 read error Dec 6 06:58:49 Tower kernel: handle_stripe read error: 359128400/2, count: 1 Dec 6 06:58:49 Tower kernel: md: disk2 read error Dec 6 06:58:49 Tower kernel: handle_stripe read error: 359128408/2, count: 1 Dec 6 06:58:49 Tower kernel: md: disk2 read error Dec 6 06:58:49 Tower kernel: handle_stripe read error: 359128416/2, count: 1 Dec 6 06:58:49 Tower kernel: md: disk2 read error Dec 6 06:58:49 Tower kernel: handle_stripe read error: 359128424/2, count: 1 Dec 6 06:58:49 Tower kernel: md: disk2 read error Dec 6 06:58:49 Tower kernel: handle_stripe read error: 359128432/2, count: 1 Dec 6 06:58:49 Tower kernel: md: disk2 read error Dec 6 06:58:49 Tower kernel: handle_stripe read error: 359128440/2, count: 1 Dec 6 06:58:49 Tower kernel: md: disk2 read error Dec 6 06:58:49 Tower kernel: handle_stripe read error: 359128448/2, count: 1 Dec 6 06:58:49 Tower kernel: md: disk2 read error Dec 6 06:58:49 Tower kernel: handle_stripe read error: 359128456/2, count: 1 Dec 6 06:58:49 Tower kernel: md: disk2 read error Dec 6 06:58:49 Tower kernel: handle_stripe read error: 359128464/2, count: 1 Dec 6 06:58:49 Tower kernel: md: disk2 read error Dec 6 06:58:49 Tower kernel: handle_stripe read error: 359128472/2, count: 1 Dec 6 06:58:49 Tower kernel: md: disk2 read error Dec 6 06:58:49 Tower kernel: handle_stripe read error: 359128480/2, count: 1 Dec 6 06:58:49 Tower kernel: md: disk2 read error Dec 6 06:58:49 Tower kernel: handle_stripe read error: 359128488/2, count: 1 Dec 6 06:58:49 Tower kernel: md: disk2 read error Dec 6 06:58:49 Tower kernel: handle_stripe read error: 359128496/2, count: 1 Dec 6 06:58:49 Tower kernel: md: disk2 read error It is failing with many un-readable sectors: Dec 6 12:22:01 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Dec 6 12:22:01 Tower kernel: ata6.00: irq_stat 0x40000001 Dec 6 12:22:01 Tower kernel: ata6.00: failed command: READ DMA EXT Dec 6 12:22:01 Tower kernel: ata6.00: cmd 25/00:00:a8:90:d4/00:04:25:00:00/e0 tag 0 dma 524288 in Dec 6 12:22:01 Tower kernel: res 51/40:cf:d0:90:d4/00:03:25:00:00/e0 Emask 0x9 (media error) Dec 6 12:22:01 Tower kernel: ata6.00: status: { DRDY ERR } Dec 6 12:22:01 Tower kernel: ata6.00: error: { UNC } Dec 6 12:22:01 Tower kernel: ata6.00: configured for UDMA/133 Dec 6 12:22:01 Tower kernel: ata6: EH complete Dec 6 12:22:15 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Dec 6 12:22:15 Tower kernel: ata6.00: irq_stat 0x40000001 Dec 6 12:22:15 Tower kernel: ata6.00: failed command: READ DMA EXT Dec 6 12:22:15 Tower kernel: ata6.00: cmd 25/00:00:a8:90:d4/00:04:25:00:00/e0 tag 0 dma 524288 in Dec 6 12:22:15 Tower kernel: res 51/40:ff:98:91:d4/00:02:25:00:00/e0 Emask 0x9 (media error) Dec 6 12:22:15 Tower kernel: ata6.00: status: { DRDY ERR } Dec 6 12:22:15 Tower kernel: ata6.00: error: { UNC } Dec 6 12:22:15 Tower kernel: ata6.00: configured for UDMA/133 Dec 6 12:22:15 Tower kernel: ata6: EH complete Dec 6 12:22:18 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 Dec 6 12:22:18 Tower kernel: ata6.00: irq_stat 0x40000001 Dec 6 12:22:18 Tower kernel: ata6.00: failed command: READ DMA EXT Dec 6 12:22:18 Tower kernel: ata6.00: cmd 25/00:00:a8:90:d4/00:04:25:00:00/e0 tag 0 dma 524288 in Dec 6 12:22:18 Tower kernel: res 51/40:cf:d0:90:d4/00:03:25:00:00/e0 Emask 0x9 (media error) Dec 6 12:22:18 Tower kernel: ata6.00: status: { DRDY ERR } Dec 6 12:22:18 Tower kernel: ata6.00: error: { UNC } You do not currently have parity protection, although IF you have all your drives in their same slots and IF you have not written to the array at all, the parity disk MIGHT be good enough to restore to a replacement disk2 (if you have one available) Unfortunately, there have been differing degrees of success in forcing an alternate disk as the one to be re-constructed in later beta versions of unRAID. You MUST NOT refresh the browser in between using the set invalidslot command, and starting the parity check. I strongly suggest you NOT write to the array at all and to contact Tom @ Lime-Tech for specific instructions on how to force a re-construction of disk2. I suspect it will similar to these instructions from this post: http://lime-technology.com/forum/index.php?topic=13866.msg131378;topicseen#msg131378 You did not lose your configuration, but by re-formatting, in effect, you did, as unRAID must now create a new super.dat file. b) Suppose you lost the config, but you know that Parity is valid, so you want to skip the lengthy re-sync. In this case, once you know which disk is Parity, and you have it and all other disks assigned, just prior to clicking the 'Start' button you can type this command in a telnet window: Code: mdcmd set invalidslot 99 Now click Start (don't do a refresh between typing this command clicking Start or else command will have no effect). What this does is tell the driver that none of the array drives are invalid, and hence won't start a sync (normally Parity is marked invalid when there's been a "New Config").
lloydsmart Posted December 6, 2011 Author Posted December 6, 2011 Thanks for the in-depth reply. I don't have another disk available, but hopefully I can RMA that one as it's only a few months old. I haven't written any data to the array (in fact it appears to be read-only at the moment), but I'm not sure where I stand with the parity disk. It's been doing a parity sync (albeit slowly) for hours now. I haven't changed any connections, so all disks are where they always have been, but I can't be sure weather they're in different menu positions in unraid. Will this matter? I had a look at the thread you linked to. If I understand correctly, I need to cancel the currently-running parity sync, then use the commands in that thread to make unraid believe that parity is ok. Then what? Do I do a parity check, or do I try to re-construct disk2, and just assume that the parity is undamaged? Thanks for your help! EDIT: The parity sync has sped up! I don't know why, but it's now running at an incredible 75MB/s! Perhaps this means that it's done with disk2? Or that it's passed the "bad bit"? EDIT2: Scratch that, it's gone back to 500KB/s.
Joe L. Posted December 6, 2011 Posted December 6, 2011 Thanks for the in-depth reply. I don't have another disk available, but hopefully I can RMA that one as it's only a few months old. I haven't written any data to the array (in fact it appears to be read-only at the moment), but I'm not sure where I stand with the parity disk. It's been doing a parity sync (albeit slowly) for hours now. I haven't changed any connections, so all disks are where they always have been, but I can't be sure weather they're in different menu positions in unraid. Will this matter? I had a look at the thread you linked to. If I understand correctly, I need to cancel the currently-running parity sync, then use the commands in that thread to make unraid believe that parity is ok. Then what? Do I do a parity check, or do I try to re-construct disk2, and just assume that the parity is undamaged? Thanks for your help! EDIT: The parity sync has sped up! I don't know why, but it's now running at an incredible 75MB/s! Perhaps this means that it's done with disk2? Or that it's passed the "bad bit"? EDIT2: Scratch that, it's gone back to 500KB/s. Basically, you are overwriting your probably good parity with the probably bad parity calc created as a result of you being unable to read some sectors on disk2. I'd stop the parity calc. It might be only making things worse. I already gave my advice.... seek help from lime-tech.
lloydsmart Posted December 6, 2011 Author Posted December 6, 2011 Thanks for your help. I've contacted lime-tech, RMA'd the drive with WD, and cancelled the parity sync. Until my new drive gets here, I'll only be using the array in read-only mode. When the new drive arrives, I'll attempt to reconstruct drive2. I wonder if it's worth restoring my backup of unraid 5.0b12a? It would have the old config file, which would mean that unraid would assume everything is ok with parity etc. If I then unplugged disk2, I may have a chance of accessing the data that was on it, and copying it somewhere else! Of course, the problem is my parity may very well be messed up now. It's been trying to sync for over 24 hours. The other problem is that I don't have anywhere to copy this data to. On second thoughts, I'll wait until the new drive arrives, pending advice from lime-tech.
lloydsmart Posted December 8, 2011 Author Posted December 8, 2011 Tomm - any comment on this? My RMA replacement drive has shipped so should be with me soon, and then I'll be looking to restore the contents of disk2 onto it, if possible. I'm thinking I just restore my backup of 5.0b12a (including config files) onto the flash before I start, so the system will think parity is ok? Then I just replace the drive and restore, right? Do you forsee any problems with this? Theoretically, the newly partially-completed parity that was overwriting my previously good parity should have been pretty much the same, right? So, in theory at least, parity shouldn't have been compromised? I haven't written anything to the array since the incident.
Joe L. Posted December 8, 2011 Posted December 8, 2011 Theoretically, the newly partially-completed parity that was overwriting my previously good parity should have been pretty much the same, right? So, in theory at least, parity shouldn't have been compromised? For each block on disk 2 that could not be read, I suspect a block of zeros was used in the new parity calculation rather than the actual contents from disk2, therefore, when you reconstruct disk2 onto its replacement, it might just reconstruct a similar block of zeros. So unfortunately, in theory, parity may have been compromised, even though it may not show errors.
lloydsmart Posted December 8, 2011 Author Posted December 8, 2011 Damn, I wish I hadn't let it go on syncing for so long now! Oh well, I'll just have to wait and see I guess. Thanks for all the help.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.