crowdx42 Posted April 8, 2017 Share Posted April 8, 2017 So did you flash the card on a windows machine or from command line? I am wondering (considering the relative low cost) if I should swap out my cards for these, I am not sure if my cards are an issue but I am still running with a single parity drive following two failures in creating the 1st parity. I am waiting for a new 8tb drive to finish preclear and will try again. One parity drive failure was a dying drive but I am not sure what happened with the first one. Quote Link to comment
Joseph Posted April 8, 2017 Author Share Posted April 8, 2017 (edited) 5 hours ago, crowdx42 said: So did you flash the card on a windows machine or from command line? I am wondering (considering the relative low cost) if I should swap out my cards for these, I am not sure if my cards are an issue but I am still running with a single parity drive following two failures in creating the 1st parity. I am waiting for a new 8tb drive to finish preclear and will try again. One parity drive failure was a dying drive but I am not sure what happened with the first one. Probably wouldn't hurt to swap out if it offers better compatibility. fwiw, Didn't try the windows route; I flashed from a command line with a usb stick... Had to cover 2 pins with electrical tape to get it to initially boot. Kudos to johnnie.black for steering me in the right direction! Like I said earlier, I experienced a similar problem... I was convinced it was a bad PS; turned out to be a combo of faulty Y splitter power cables and poor connections made by me. After re-seating everything & replacing el cheap-o cabling, that problem went away. Its a PITA but give it a shot; it might work. I'm running parity check number 2 and will let you know if the errors went away. Edited April 8, 2017 by Joseph Quote Link to comment
crowdx42 Posted April 8, 2017 Share Posted April 8, 2017 21 hours ago, Joseph said: Well I am currently starting a partiy build for the first parity drive, the second parity drive was fine. For the new parity drive got an 8tb WD which passed a preclear and I also upgraded from Corsair HX850 to an HX1200. I also moved the parity drive so that both parity drives are on separate controllers. All the drive power cabling is direct into the backplane of the Norco 4220 and the cables are all modular from the Corsair PSU. Fingers crossed everything runs without any errors. I have a second 8tb which I will install if the first parity builds with no issues Quote Link to comment
Joseph Posted April 8, 2017 Author Share Posted April 8, 2017 (edited) Unless you're just getting a batch of bum hard drives (possible, but highly unlikely), or the drives just can't handle the unRAID platform (I wound up getting rid of most of my old drives that didn't meet the criteria) then there's only a few things left that I can think of.... HBA, PS, data/power cabling and backplane. Doubt its the PS, since you just replaced it. (side note: as I'm sure you know, make sure you're in single rail mode.) I guess you could try swapping 2 HDD cables around on the backplane and see if the problem follows the cable swap or stays to try to narrow down where the problem is or isn't. Then do the same for the HBA A/B port cables. Seems like that would narrow it down to one of the 3 remaining things. FYI, I know just enough to get myself in trouble, so consider the source. after thougth: Are you using ECC memory? Edited April 8, 2017 by Joseph after thought Quote Link to comment
crowdx42 Posted April 8, 2017 Share Posted April 8, 2017 So I have done pretty much all you have mentioned. By moving the drive to a different controller, I did this by moving it to a different drive bay. The only thing that I have not swapped is the backplane BUT everything was working a little over a week ago with no parity errors. The weird thing I am wondering about is that the second parity disk checked parity without getting any errors, it is only the first parity drive that had errors BUT I did get unlucky with drives as the drive I swapped after the first parity issue did have bad sectors and then also had some smart errors and I am no RMAing that drive. I am not using ECC memory, just two sticks of dd3. Quote Link to comment
Joseph Posted April 8, 2017 Author Share Posted April 8, 2017 (edited) 28 minutes ago, crowdx42 said: So I have done pretty much all you have mentioned. By moving the drive to a different controller, I did this by moving it to a different drive bay. The only thing that I have not swapped is the backplane BUT everything was working a little over a week ago with no parity errors. The weird thing I am wondering about is that the second parity disk checked parity without getting any errors, it is only the first parity drive that had errors BUT I did get unlucky with drives as the drive I swapped after the first parity issue did have bad sectors and then also had some smart errors and I am no RMAing that drive. I am not using ECC memory, just two sticks of dd3. yeah, I gotcha... then probably the 'easiest' thing to try is replacing the HBAs (look, we're back where we started! lol). Relatively speaking, its inexpensive. It just doesn't make sense everything was working, now its not. When you said you moved stuff to a new case, that's what got me focused on the cabling route. (again, I also based that on my personal experience. ) But if it doesn't fix, maybe try running a robust memory test overnight since its non-ECC; it sucks being down that long though. Also, you could recheck SMART on a known good box on some of the drives in question and see what happens. Grab another USB, put unRAID trial on it and just test SMART one drive at a time. Let me know how it goes. after thought: you have the latest mobo & controller BIOSes? You could also 'stress test' via preclear on the known good box some of the drives in question if they pass SMART. Edited April 8, 2017 by Joseph after thought Quote Link to comment
crowdx42 Posted April 8, 2017 Share Posted April 8, 2017 I have not messed with the BIOS on MB or controllers BUT it would be weird that they worked and then did not work. Also everything was working in the new setup and it had done two consecutive weekend parity checks with no errors, then it gave me around 200 errors twice in a row and then freaked out with 17k errors which I assumed was a drive failing (which I did have a drive failing but not one of the parity drives.) I have my fingers and toes crossed that the new parity drive will fix the issue. The frustration with issues like this is that it makes the full system not trustworthy and defeats the purpose of backing up to it etc Quote Link to comment
Joseph Posted April 8, 2017 Author Share Posted April 8, 2017 On 4/3/2017 at 10:04 AM, EdgarWallace said: I have installed a SAS2LP in my Backup Server, disabled VT-d and still having parity check errors. I ordered a Dell Perc H200 today and will report back If that card is going to resolve my issues. Btw. is it safe to run the parity check once the new controller is installed with the "Write corrections to parity" option? Yes, as long as your data drives don't have any corruption on them, writing corrections to parity will 'freshen' the parity data so a data drive can properly be rebuilt. Currently, I'm doing 2 parity checks. One to 'fix' the 5 errors I was receving with the old controller; and the second one to see if the errors went away. I will post my findings when its through. If you (or I) are still receiving parity errors after replacing the hba, then there's something else going on that needs to be addressed. Has your card come in yet? Quote Link to comment
Joseph Posted April 8, 2017 Author Share Posted April 8, 2017 9 minutes ago, crowdx42 said: The frustration with issues like this is that it makes the full system not trustworthy and defeats the purpose of backing up to it etc Indeed!! I felt like that too... its been a long, interesting year trying to get it to work the way I want it to. I had to rethink some things I had on my old box and I went way over budget with hardware. But, if I can resolve these last parity errors, I will be extremely happy with the scalability and functionality of unRAID. The 2 remaining concerns are a little more esoteric: Combating data rot and ransom-ware, which I hope I can spend some time understanding then implementing solutions soon. Quote Link to comment
Joseph Posted April 8, 2017 Author Share Posted April 8, 2017 (edited) NEW CARD: DID NOT FIX PROBLEM!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! FAIL!! Apr 8 15:54:27 Tower kernel: md: recovery thread: Q corrected, sector=3519069768 Apr 8 15:54:27 Tower kernel: md: recovery thread: Q corrected, sector=3519069776 Apr 8 15:54:27 Tower kernel: md: recovery thread: Q corrected, sector=3519069784 Apr 8 15:54:27 Tower kernel: md: recovery thread: Q corrected, sector=3519069792 Apr 8 15:54:27 Tower kernel: md: recovery thread: Q corrected, sector=3519069800 bum hard drive(s)? I'm open to any ideas/suggestions. Is there a 1-click SMART report that can be run on all drives in the array? Or does each one have to be done manually? Edited April 8, 2017 by Joseph update Quote Link to comment
crowdx42 Posted April 8, 2017 Share Posted April 8, 2017 DAMN!!!!! Well that really sucks I googled that error you are getting and one user took over a year for it to go away by itself? WTFhttp://lime-technology.com/oldforum/index.php?topic=38359.105 Quote Link to comment
Joseph Posted April 8, 2017 Author Share Posted April 8, 2017 Thanks man, I'm looking at the previous log to see if the sector is the same. Quote Link to comment
JorgeB Posted April 8, 2017 Share Posted April 8, 2017 If you didn't reboot since both checks post the diagnostics. Quote Link to comment
crowdx42 Posted April 8, 2017 Share Posted April 8, 2017 Btw, are you on the latest unRAID version? I am on the latest and I am wondering if it could be related, if I recall correctly it was just before I started having issues that I updated to the latest revision, pretty unlikely the parity algorithm changed, but is making me wonder lol Quote Link to comment
Joseph Posted April 8, 2017 Author Share Posted April 8, 2017 (edited) from Apr 2 (old hba): Apr 2 04:59:10 Tower kernel: md: recovery thread: PQ corrected, sector=3519069768 Apr 2 04:59:10 Tower kernel: md: recovery thread: Q corrected, sector=3519069776 Apr 2 04:59:10 Tower kernel: md: recovery thread: Q corrected, sector=3519069784 Apr 2 04:59:10 Tower kernel: md: recovery thread: Q corrected, sector=3519069792 Apr 2 04:59:10 Tower kernel: md: recovery thread: PQ corrected, sector=3519069800 from Today (replacement hba; 1st chk) searched log; can't find <~~~because unRAID was rebooted. from Today (replacement hba; 2nd chk, still running) Apr 8 15:54:27 Tower kernel: md: recovery thread: Q corrected, sector=3519069768 Apr 8 15:54:27 Tower kernel: md: recovery thread: Q corrected, sector=3519069776 Apr 8 15:54:27 Tower kernel: md: recovery thread: Q corrected, sector=3519069784 Apr 8 15:54:27 Tower kernel: md: recovery thread: Q corrected, sector=3519069792 Apr 8 15:54:27 Tower kernel: md: recovery thread: Q corrected, sector=3519069800 full syslog from today (2nd chk, still running) attached. This is happening somewhere between 0% - 48.7% of the way complete. Edited April 20, 2017 by Joseph update Quote Link to comment
Joseph Posted April 8, 2017 Author Share Posted April 8, 2017 (edited) 19 minutes ago, johnnie.black said: If you didn't reboot since both checks post the diagnostics. I just posted diags from about 5 or so minutes ago; that should have all the data from the reboot yesterday going forward.... correct? Or, should I post both logs? Turns out I rebooted first thing this morning Would you suggest cancelling parity check and re-running it and see if the sector or parity drive changes? It looks like the sectors are the same from the Apr2 report only the parity drive affected changed. Edited April 8, 2017 by Joseph corrected Quote Link to comment
Joseph Posted April 8, 2017 Author Share Posted April 8, 2017 10 minutes ago, crowdx42 said: Btw, are you on the latest unRAID version? I am on the latest and I am wondering if it could be related, if I recall correctly it was just before I started having issues that I updated to the latest revision, pretty unlikely the parity algorithm changed, but is making me wonder lol I'm running 6.3.3, but Its been going on for a while I think. Quote Link to comment
JorgeB Posted April 8, 2017 Share Posted April 8, 2017 15 minutes ago, Joseph said: I just posted diags from about 5 or so minutes ago; that should have all the data from the reboot yesterday going forward.... correct? Or, should I post both logs? Turns out I rebooted first thing this morning Would you suggest cancelling parity check and re-running it and see if the sector or parity drive changes? It looks like the sectors are the same from the Apr2 report only the parity drive affected changed. Yes, cancel and start again, to see if the errors repeat and also since we can't see if the first check was correcting. Quote Link to comment
Joseph Posted April 8, 2017 Author Share Posted April 8, 2017 1 minute ago, johnnie.black said: Yes, cancel and start again, to see if the errors repeat and also since we can't see if the first check was correcting. thanks for your help. Will report back. Quote Link to comment
Joseph Posted April 9, 2017 Author Share Posted April 9, 2017 (edited) 52.3% 100% complete and no errors...now I'm thoroughly confused! I probably won't check it any more tonight, but will post results in the am. UPDATE: Logs attached. I'm gonna start another parity check without rebooting. If all goes well I will reboot and run it again to see if the errors return on reboot. Last check completed on Sun 09 Apr 2017 03:32:02 AM CDT (today), finding 0 errors.Duration: 10 hours, 35 minutes, 47 seconds. Average speed: 104.9 MB/sec Edited April 20, 2017 by Joseph update Quote Link to comment
crowdx42 Posted April 9, 2017 Share Posted April 9, 2017 Well my parity build succeeded. So now my dilemma is should I do a parity check now and once that is successful add the second 8tb parity drive or add it now and do the parity check once it completes rebuilding. The only error I see popping up is the one below which does not seemed to have affected the parity rebuild. I also attached the full logs below. sas: sas_eh_handle_sas_errors: task 0xffff880211e5e200 is aborted unraid_6-diagnostics-20170409-0744.zip Quote Link to comment
Joseph Posted April 9, 2017 Author Share Posted April 9, 2017 44 minutes ago, crowdx42 said: Well my parity build succeeded. So now my dilemma is should I do a parity check now and once that is successful add the second 8tb parity drive or add it now and do the parity check once it completes rebuilding. The only error I see popping up is the one below which does not seemed to have affected the parity rebuild. I also attached the full logs below. sas: sas_eh_handle_sas_errors: task 0xffff880211e5e200 is aborted unraid_6-diagnostics-20170409-0744.zip I looked at your logs, but I'm afraid its over my head. Not to muddy the waters but I found this: https://forums.lime-technology.com/topic/56232-parity-check-found-errors/ With my new issue notwithstanding, it sounds like the SAS2LPs just won't cut it for some who have unRAID; so I'd encourage you to replace them first, then do parity checks. If that isn't an option at this time, I'd run a check first just to see or you could try the 8tb and rebuild parity, then run a check on that to be sure afterwards... but again, if it was me, I'd get the cards replaced first. Quote Link to comment
crowdx42 Posted April 9, 2017 Share Posted April 9, 2017 Did your parity check succeed? I think I will wait to replace my SAS2LPs (I have 3 in total, 2 on the main and 1 on the backup) . I am also wondering if I should check to see what firmware they are on not sure how to do that without pulling them and putting them into a windows machine Quote Link to comment
Joseph Posted April 9, 2017 Author Share Posted April 9, 2017 7 minutes ago, crowdx42 said: Did your parity check succeed? I think I will wait to replace my SAS2LPs (I have 3 in total, 2 on the main and 1 on the backup) . I am also wondering if I should check to see what firmware they are on not sure how to do that without pulling them and putting them into a windows machine with the new card installed, the first parity check corrected 5 errors (this was to be expected) However the second parity check found the same 5 errors. Parity check 3 came back ok. I'm running check 4 and will post what I find. If it goes well, the plan is to reboot afterward and try it again. re the SAS2LP flash If memory serves, the one I bought new needed to be flashed but they haven't had a new firmware update for awhile. I downloaded .1812 back in May of 2016. https://www.supermicro.com/products/accessories/addon/aoc-sas2lp-mv8.cfm Quote Link to comment
crowdx42 Posted April 9, 2017 Share Posted April 9, 2017 Sounds like your issues may finally be resolved. I am not sure if you posted this, what motherboard, cpu, memory etc are you running? My main setup is all intel and so I am wondering if there is any connection. Also, for the SAS card update, did you have to check via command line? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.