October 1, 201114 yr I'm trying to figure out if my parity issues are being caused by a new disk, a sata card, or a cable etc. I had 6 drives working fine connected directly to the motherboard, I had a memory issue a while back when I was running just 3 drives but the memory was replaced and tested for 24 hours in memtest. Since then I had upgraded to 6 disks and everything was working perfectly. I had added the drives, calculated parity transferred a few terrabytes to the array, ran a parity check with no errors, transferred several more Terrabytes to the array and since then ran two more parity checks without errors. Things started filling up as they tend to do and so I filled out the rest of the cases available drive bays. I added 3 pci express sata cards and 6 (5 data 1 cache) drives 2 on each card. All of the drives precleared successfully on their respective sata cards and were added to the array, I start filling them up and run into an issue one day with an improper shutdown, run party and it finds a bunch of errors, I test a bunch of the files and they seem ok individually so I have the system correct the parity drive and run another parity check, I get more errors again. I check all the wiring reseat all the cards and again lots of errors. Next step was I removed the two still empty drives from the array in hopes it was one of the empty disks causing problems, recreate parity on the new array and then start a check, errors again. So at this point I'm fairly certain that I have either a bad drive that precleared once without error but is flaking out, a bad sata card, or a bad cable. I'm fairly sure memory is ok since I ran the test for 24 hours on it. I think everything else is ok due to the system having no problems before the new drives and cards. The power supply should be fine handing the load that is has, it's a corsair CMPSU-650txv2, single rail 53a on the 12v 30a on the 5v, all the drives are green 2tb drives, the processor is a 65 watt amd, I checked the drives and they are rated at 2 amps spin up, and around 6.3 watts during read/write but if anyone thinks that may be a power issue when reading all the drives at once i can give more details there. One thing I'd like to try is to disconnect the original 5 data drives, hook up the 5 new data drives directly to the motherboard using the cables from those drives, pull the cards, rebuild the array, calculate parity, and then run a parity check on that setup... I think if that cleared a few parity checks it would confirm it to be one of the cables or sata cards I'm using. If i had errors then I could narrow it to one of those new drives. The problem (or question really) is will I be able to make a new array using the original 5 data disks again at this point. Basically can I create an array with disks that were once part of an array and force it to accept those discs with their data intact and just rebuild the parity drive again. I'm running 5.12a beta because of the network card that machine has and the intel nic I had was not working right. I think I'm doomed that most tech I buy is broken and never works right at first but I'm hoping I can get this all resolved. I attached the most recent syslog where I removed the 2 disks and rebuild the array with 8 of the 10 data drives, the parity calculation is in there then the start of a new parity check, after a few errors early on I just cancelled that check and am currently manually hunting for corrupted files to hopefully find a bad disk that way. Sorry for the long windedness, I tried to edit it down to be as clear as possible. The question that would help me the most to be answered is if I can take the drives out of the array, build it with only some to run tests, then rebuild it with all the drives again without data loss. I think that's possible but I'd rahter not do it till someone says it can be done for sure =) Thanks and sorry for the long windedness, I tried to edit it down to be as clear as possible syslog-2011-10-01.zip
October 1, 201114 yr Author in my manual testing of files I think i've found a disk that is misbehaving, I'm going to try changing cables first and then try using a different sata port if that doesn't do it. Wish me luck =) 7a33f3653b3617769b073604d6ce26ef 10 Things I Hate About You (1999).tt0147800.1080p.mkv a08e22dbad8d3fd4bbeff5a37d2b451d 10 Things I Hate About You (1999).tt0147800.1080p.mkv 5c963ff576f4059c0b8900a344779e94 10 Things I Hate About You (1999).tt0147800.1080p.mkv That can't be a good sign for a disk when the md5sum give 3 different values 3 different times.
October 1, 201114 yr in my manual testing of files I think i've found a disk that is misbehaving, I'm going to try changing cables first and then try using a different sata port if that doesn't do it. Wish me luck =) 7a33f3653b3617769b073604d6ce26ef 10 Things I Hate About You (1999).tt0147800.1080p.mkv a08e22dbad8d3fd4bbeff5a37d2b451d 10 Things I Hate About You (1999).tt0147800.1080p.mkv 5c963ff576f4059c0b8900a344779e94 10 Things I Hate About You (1999).tt0147800.1080p.mkv That can't be a good sign for a disk when the md5sum give 3 different values 3 different times. you are on to something there... If only that disk has the problem, it could point to a bad disk or a bad disk controller port. Joe L.
October 1, 201114 yr Author in my manual testing of files I think i've found a disk that is misbehaving, I'm going to try changing cables first and then try using a different sata port if that doesn't do it. Wish me luck =) 7a33f3653b3617769b073604d6ce26ef 10 Things I Hate About You (1999).tt0147800.1080p.mkv a08e22dbad8d3fd4bbeff5a37d2b451d 10 Things I Hate About You (1999).tt0147800.1080p.mkv 5c963ff576f4059c0b8900a344779e94 10 Things I Hate About You (1999).tt0147800.1080p.mkv That can't be a good sign for a disk when the md5sum give 3 different values 3 different times. you are on to something there... If only that disk has the problem, it could point to a bad disk or a bad disk controller port. Joe L. yeah, first test has been to swap out the sata cable, figure I'll do the super easy step first, if I have the same problem with the second cable that was "known good" then I'll have it down to drive itself or the card/port.... I'm doing this as I type and so far 3 md5's of that file came back the same, will test more files 3-4 times each and see if they all report the same. It's possible that I used a different sata cable when I did the preclear and initial setup, I'm trying to remember if I had all of the sata cables I have now.. I had some non right angle ones I was using before and I dont' remember exactly when I swapped them out. So far swapped sata cable and here is the preliminary.. 5aba0435b16337e9b2ac181d9afebf4e 10 Things I Hate About You (1999).tt0147800.1080p.mkv 5aba0435b16337e9b2ac181d9afebf4e 10 Things I Hate About You (1999).tt0147800.1080p.mkv 5aba0435b16337e9b2ac181d9afebf4e 10 Things I Hate About You (1999).tt0147800.1080p.mkv So far so good... of course..... god only knows if the file was corrupted... thankfully a bit or two here and there in video files likely won't be that big of a deal and I can always re-rip if I need to =) Off to test a bunch more files several time each. I really hope it was something as simple as a bad cable.
October 1, 201114 yr Author Well it was going so nicely but I had another error checking md5's, the rate of these errors as dropped dramatically after changing the cable so I'm beginning to think it's the port on the card not making a good connection. I'm going to try a different port using the original cable and see what happens then I'll try the second cable as well.
October 1, 201114 yr in my manual testing of files I think i've found a disk that is misbehaving, I'm going to try changing cables first and then try using a different sata port if that doesn't do it. Wish me luck =) 7a33f3653b3617769b073604d6ce26ef 10 Things I Hate About You (1999).tt0147800.1080p.mkv a08e22dbad8d3fd4bbeff5a37d2b451d 10 Things I Hate About You (1999).tt0147800.1080p.mkv 5c963ff576f4059c0b8900a344779e94 10 Things I Hate About You (1999).tt0147800.1080p.mkv That can't be a good sign for a disk when the md5sum give 3 different values 3 different times. you are on to something there... If only that disk has the problem, it could point to a bad disk or a bad disk controller port. Joe L. yeah, first test has been to swap out the sata cable, figure I'll do the super easy step first, if I have the same problem with the second cable that was "known good" then I'll have it down to drive itself or the card/port.... I'm doing this as I type and so far 3 md5's of that file came back the same, will test more files 3-4 times each and see if they all report the same. It's possible that I used a different sata cable when I did the preclear and initial setup, I'm trying to remember if I had all of the sata cables I have now.. I had some non right angle ones I was using before and I dont' remember exactly when I swapped them out. So far swapped sata cable and here is the preliminary.. 5aba0435b16337e9b2ac181d9afebf4e 10 Things I Hate About You (1999).tt0147800.1080p.mkv 5aba0435b16337e9b2ac181d9afebf4e 10 Things I Hate About You (1999).tt0147800.1080p.mkv 5aba0435b16337e9b2ac181d9afebf4e 10 Things I Hate About You (1999).tt0147800.1080p.mkv So far so good... of course..... god only knows if the file was corrupted... thankfully a bit or two here and there in video files likely won't be that big of a deal and I can always re-rip if I need to =) Off to test a bunch more files several time each. I really hope it was something as simple as a bad cable. The cable is unlikely, as the errors from it would probably have shown as CRC errors in the system log. The problem with testing the same file again and again is it will probably be read from RAM on subsequent attempts unless it is really huge.
October 2, 201114 yr Author I think I have it solved down to the sata card I was using. I have the drive hooked up to another card now and it seems to be running quite reliably, time to calculate partiy and check it a few times. Disk IO speeds really are a pita when troubleshooting an issue like this. Crossing my fingers and starting the calc and checks.
Archived
This topic is now archived and is closed to further replies.