April 8, 20179 yr So did you flash the card on a windows machine or from command line? I am wondering (considering the relative low cost) if I should swap out my cards for these, I am not sure if my cards are an issue but I am still running with a single parity drive following two failures in creating the 1st parity. I am waiting for a new 8tb drive to finish preclear and will try again. One parity drive failure was a dying drive but I am not sure what happened with the first one.
April 8, 20179 yr Author 5 hours ago, crowdx42 said: So did you flash the card on a windows machine or from command line? I am wondering (considering the relative low cost) if I should swap out my cards for these, I am not sure if my cards are an issue but I am still running with a single parity drive following two failures in creating the 1st parity. I am waiting for a new 8tb drive to finish preclear and will try again. One parity drive failure was a dying drive but I am not sure what happened with the first one. Probably wouldn't hurt to swap out if it offers better compatibility. fwiw, Didn't try the windows route; I flashed from a command line with a usb stick... Had to cover 2 pins with electrical tape to get it to initially boot. Kudos to johnnie.black for steering me in the right direction! Like I said earlier, I experienced a similar problem... I was convinced it was a bad PS; turned out to be a combo of faulty Y splitter power cables and poor connections made by me. After re-seating everything & replacing el cheap-o cabling, that problem went away. Its a PITA but give it a shot; it might work. I'm running parity check number 2 and will let you know if the errors went away. Edited April 8, 20179 yr by Joseph
April 8, 20179 yr 21 hours ago, Joseph said: Well I am currently starting a partiy build for the first parity drive, the second parity drive was fine. For the new parity drive got an 8tb WD which passed a preclear and I also upgraded from Corsair HX850 to an HX1200. I also moved the parity drive so that both parity drives are on separate controllers. All the drive power cabling is direct into the backplane of the Norco 4220 and the cables are all modular from the Corsair PSU. Fingers crossed everything runs without any errors. I have a second 8tb which I will install if the first parity builds with no issues
April 8, 20179 yr Author Unless you're just getting a batch of bum hard drives (possible, but highly unlikely), or the drives just can't handle the unRAID platform (I wound up getting rid of most of my old drives that didn't meet the criteria) then there's only a few things left that I can think of.... HBA, PS, data/power cabling and backplane. Doubt its the PS, since you just replaced it. (side note: as I'm sure you know, make sure you're in single rail mode.) I guess you could try swapping 2 HDD cables around on the backplane and see if the problem follows the cable swap or stays to try to narrow down where the problem is or isn't. Then do the same for the HBA A/B port cables. Seems like that would narrow it down to one of the 3 remaining things. FYI, I know just enough to get myself in trouble, so consider the source. after thougth: Are you using ECC memory? Edited April 8, 20179 yr by Joseph after thought
April 8, 20179 yr So I have done pretty much all you have mentioned. By moving the drive to a different controller, I did this by moving it to a different drive bay. The only thing that I have not swapped is the backplane BUT everything was working a little over a week ago with no parity errors. The weird thing I am wondering about is that the second parity disk checked parity without getting any errors, it is only the first parity drive that had errors BUT I did get unlucky with drives as the drive I swapped after the first parity issue did have bad sectors and then also had some smart errors and I am no RMAing that drive. I am not using ECC memory, just two sticks of dd3.
April 8, 20179 yr Author 28 minutes ago, crowdx42 said: So I have done pretty much all you have mentioned. By moving the drive to a different controller, I did this by moving it to a different drive bay. The only thing that I have not swapped is the backplane BUT everything was working a little over a week ago with no parity errors. The weird thing I am wondering about is that the second parity disk checked parity without getting any errors, it is only the first parity drive that had errors BUT I did get unlucky with drives as the drive I swapped after the first parity issue did have bad sectors and then also had some smart errors and I am no RMAing that drive. I am not using ECC memory, just two sticks of dd3. yeah, I gotcha... then probably the 'easiest' thing to try is replacing the HBAs (look, we're back where we started! lol). Relatively speaking, its inexpensive. It just doesn't make sense everything was working, now its not. When you said you moved stuff to a new case, that's what got me focused on the cabling route. (again, I also based that on my personal experience. ) But if it doesn't fix, maybe try running a robust memory test overnight since its non-ECC; it sucks being down that long though. Also, you could recheck SMART on a known good box on some of the drives in question and see what happens. Grab another USB, put unRAID trial on it and just test SMART one drive at a time. Let me know how it goes. after thought: you have the latest mobo & controller BIOSes? You could also 'stress test' via preclear on the known good box some of the drives in question if they pass SMART. Edited April 8, 20179 yr by Joseph after thought
April 8, 20179 yr I have not messed with the BIOS on MB or controllers BUT it would be weird that they worked and then did not work. Also everything was working in the new setup and it had done two consecutive weekend parity checks with no errors, then it gave me around 200 errors twice in a row and then freaked out with 17k errors which I assumed was a drive failing (which I did have a drive failing but not one of the parity drives.) I have my fingers and toes crossed that the new parity drive will fix the issue. The frustration with issues like this is that it makes the full system not trustworthy and defeats the purpose of backing up to it etc
April 8, 20179 yr Author On 4/3/2017 at 10:04 AM, EdgarWallace said: I have installed a SAS2LP in my Backup Server, disabled VT-d and still having parity check errors. I ordered a Dell Perc H200 today and will report back If that card is going to resolve my issues. Btw. is it safe to run the parity check once the new controller is installed with the "Write corrections to parity" option? Yes, as long as your data drives don't have any corruption on them, writing corrections to parity will 'freshen' the parity data so a data drive can properly be rebuilt. Currently, I'm doing 2 parity checks. One to 'fix' the 5 errors I was receving with the old controller; and the second one to see if the errors went away. I will post my findings when its through. If you (or I) are still receiving parity errors after replacing the hba, then there's something else going on that needs to be addressed. Has your card come in yet?
April 8, 20179 yr Author 9 minutes ago, crowdx42 said: The frustration with issues like this is that it makes the full system not trustworthy and defeats the purpose of backing up to it etc Indeed!! I felt like that too... its been a long, interesting year trying to get it to work the way I want it to. I had to rethink some things I had on my old box and I went way over budget with hardware. But, if I can resolve these last parity errors, I will be extremely happy with the scalability and functionality of unRAID. The 2 remaining concerns are a little more esoteric: Combating data rot and ransom-ware, which I hope I can spend some time understanding then implementing solutions soon.
April 8, 20179 yr Author NEW CARD: DID NOT FIX PROBLEM!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! Apr 8 15:54:27 Tower kernel: md: recovery thread: Q corrected, sector=3519069768 Apr 8 15:54:27 Tower kernel: md: recovery thread: Q corrected, sector=3519069776 Apr 8 15:54:27 Tower kernel: md: recovery thread: Q corrected, sector=3519069784 Apr 8 15:54:27 Tower kernel: md: recovery thread: Q corrected, sector=3519069792 Apr 8 15:54:27 Tower kernel: md: recovery thread: Q corrected, sector=3519069800 bum hard drive(s)? I'm open to any ideas/suggestions. Is there a 1-click SMART report that can be run on all drives in the array? Or does each one have to be done manually? Edited April 8, 20179 yr by Joseph update
April 8, 20179 yr DAMN!!!!! Well that really sucks I googled that error you are getting and one user took over a year for it to go away by itself? WTFhttp://lime-technology.com/oldforum/index.php?topic=38359.105
April 8, 20179 yr Author Thanks man, I'm looking at the previous log to see if the sector is the same.
April 8, 20179 yr Btw, are you on the latest unRAID version? I am on the latest and I am wondering if it could be related, if I recall correctly it was just before I started having issues that I updated to the latest revision, pretty unlikely the parity algorithm changed, but is making me wonder lol
April 8, 20179 yr Author from Apr 2 (old hba): Apr 2 04:59:10 Tower kernel: md: recovery thread: PQ corrected, sector=3519069768 Apr 2 04:59:10 Tower kernel: md: recovery thread: Q corrected, sector=3519069776 Apr 2 04:59:10 Tower kernel: md: recovery thread: Q corrected, sector=3519069784 Apr 2 04:59:10 Tower kernel: md: recovery thread: Q corrected, sector=3519069792 Apr 2 04:59:10 Tower kernel: md: recovery thread: PQ corrected, sector=3519069800 from Today (replacement hba; 1st chk) searched log; can't find <~~~because unRAID was rebooted. from Today (replacement hba; 2nd chk, still running) Apr 8 15:54:27 Tower kernel: md: recovery thread: Q corrected, sector=3519069768 Apr 8 15:54:27 Tower kernel: md: recovery thread: Q corrected, sector=3519069776 Apr 8 15:54:27 Tower kernel: md: recovery thread: Q corrected, sector=3519069784 Apr 8 15:54:27 Tower kernel: md: recovery thread: Q corrected, sector=3519069792 Apr 8 15:54:27 Tower kernel: md: recovery thread: Q corrected, sector=3519069800 full syslog from today (2nd chk, still running) attached. This is happening somewhere between 0% - 48.7% of the way complete. Edited April 20, 20179 yr by Joseph update
April 8, 20179 yr Author 19 minutes ago, johnnie.black said: If you didn't reboot since both checks post the diagnostics. I just posted diags from about 5 or so minutes ago; that should have all the data from the reboot yesterday going forward.... correct? Or, should I post both logs? Turns out I rebooted first thing this morning Would you suggest cancelling parity check and re-running it and see if the sector or parity drive changes? It looks like the sectors are the same from the Apr2 report only the parity drive affected changed. Edited April 8, 20179 yr by Joseph corrected
April 8, 20179 yr Author 10 minutes ago, crowdx42 said: Btw, are you on the latest unRAID version? I am on the latest and I am wondering if it could be related, if I recall correctly it was just before I started having issues that I updated to the latest revision, pretty unlikely the parity algorithm changed, but is making me wonder lol I'm running 6.3.3, but Its been going on for a while I think.
April 8, 20179 yr Community Expert 15 minutes ago, Joseph said: I just posted diags from about 5 or so minutes ago; that should have all the data from the reboot yesterday going forward.... correct? Or, should I post both logs? Turns out I rebooted first thing this morning Would you suggest cancelling parity check and re-running it and see if the sector or parity drive changes? It looks like the sectors are the same from the Apr2 report only the parity drive affected changed. Yes, cancel and start again, to see if the errors repeat and also since we can't see if the first check was correcting.
April 8, 20179 yr Author 1 minute ago, johnnie.black said: Yes, cancel and start again, to see if the errors repeat and also since we can't see if the first check was correcting. thanks for your help. Will report back.
April 9, 20179 yr Author 52.3% 100% complete and no errors...now I'm thoroughly confused! I probably won't check it any more tonight, but will post results in the am. UPDATE: Logs attached. I'm gonna start another parity check without rebooting. If all goes well I will reboot and run it again to see if the errors return on reboot. Last check completed on Sun 09 Apr 2017 03:32:02 AM CDT (today), finding 0 errors.Duration: 10 hours, 35 minutes, 47 seconds. Average speed: 104.9 MB/sec Edited April 20, 20179 yr by Joseph update
April 9, 20179 yr Well my parity build succeeded. So now my dilemma is should I do a parity check now and once that is successful add the second 8tb parity drive or add it now and do the parity check once it completes rebuilding. The only error I see popping up is the one below which does not seemed to have affected the parity rebuild. I also attached the full logs below. sas: sas_eh_handle_sas_errors: task 0xffff880211e5e200 is aborted unraid_6-diagnostics-20170409-0744.zip
April 9, 20179 yr Author 44 minutes ago, crowdx42 said: Well my parity build succeeded. So now my dilemma is should I do a parity check now and once that is successful add the second 8tb parity drive or add it now and do the parity check once it completes rebuilding. The only error I see popping up is the one below which does not seemed to have affected the parity rebuild. I also attached the full logs below. sas: sas_eh_handle_sas_errors: task 0xffff880211e5e200 is aborted unraid_6-diagnostics-20170409-0744.zip I looked at your logs, but I'm afraid its over my head. Not to muddy the waters but I found this: https://forums.lime-technology.com/topic/56232-parity-check-found-errors/ With my new issue notwithstanding, it sounds like the SAS2LPs just won't cut it for some who have unRAID; so I'd encourage you to replace them first, then do parity checks. If that isn't an option at this time, I'd run a check first just to see or you could try the 8tb and rebuild parity, then run a check on that to be sure afterwards... but again, if it was me, I'd get the cards replaced first.
April 9, 20179 yr Did your parity check succeed? I think I will wait to replace my SAS2LPs (I have 3 in total, 2 on the main and 1 on the backup) . I am also wondering if I should check to see what firmware they are on not sure how to do that without pulling them and putting them into a windows machine
April 9, 20179 yr Author 7 minutes ago, crowdx42 said: Did your parity check succeed? I think I will wait to replace my SAS2LPs (I have 3 in total, 2 on the main and 1 on the backup) . I am also wondering if I should check to see what firmware they are on not sure how to do that without pulling them and putting them into a windows machine with the new card installed, the first parity check corrected 5 errors (this was to be expected) However the second parity check found the same 5 errors. Parity check 3 came back ok. I'm running check 4 and will post what I find. If it goes well, the plan is to reboot afterward and try it again. re the SAS2LP flash If memory serves, the one I bought new needed to be flashed but they haven't had a new firmware update for awhile. I downloaded .1812 back in May of 2016. https://www.supermicro.com/products/accessories/addon/aoc-sas2lp-mv8.cfm
April 9, 20179 yr Sounds like your issues may finally be resolved. I am not sure if you posted this, what motherboard, cpu, memory etc are you running? My main setup is all intel and so I am wondering if there is any connection. Also, for the SAS card update, did you have to check via command line?
Archived
This topic is now archived and is closed to further replies.