February 28, 20179 yr Hello. Could anyone tell me from the attached log what is the most likely cause for the Disk 6 (sdi) errors and disabling during a parity check? I recently added 2 new drives -- sdf to the array as Disk 7 and sdg as a hot spare -- and this was the first parity check run after that. (Parity was checked immediately before the installation of the new drives with 0 errors returned and nothing had been written to the server in between). This is my main server but I run it remotely so I stopped the array and replaced sdi with the new spare sdg and am now running a Parity Sync / Data Rebuild however I haven't physically removed the disabled drive. What I'm trying to determine remotely if I can is if the drive itself failed or if it maybe just got disconnected during the parity check due to a loose cable, etc. (I can't recall if this drive is connected to the motherboard or SATA controller card and I don't know how to tell that from the log either). Bottom line, if I know the drive itself failed then I can do an advance RMA and simply go out to replace it with the replacement drive once I receive it, however if it's more likely a loose connection then I've got to drive 3 hours to my niece who hosts the server just to check the cables first, which I'd of course rather avoid if I don't have to. If that's not something that can be determined from the log let me know if there's any other info I can provide or diagnostics I can run to save me the trip. Thanks! syslog.txt
February 28, 20179 yr Community Expert You should always go to Tools - Diagnostics and post the complete diagnostics zip instead of just the syslog. The diagnostics zip includes syslog(s), SMART for all disks, and a lot of other useful information.
February 28, 20179 yr Author 1 hour ago, trurl said: You should always go to Tools - Diagnostics and post the complete diagnostics zip instead of just the syslog. The diagnostics zip includes syslog(s), SMART for all disks, and a lot of other useful information. Thanks for the tip. Diagnostics zip is attached. jbox-diagnostics-20170228-1454.zip
February 28, 20179 yr Community Expert Looks like it corrected some parity errors before the read errors on disk6 started, then the write errors on disk6 which disabled it. The original disk6 Feb 26 13:39:24 JBOX kernel: md: import disk6: (sdi) WDC_WD60EFRX-68MYMN1_WD-WX91D65DCDSN size: 5860522532 isn't reporting its SMART so can't really tell anything about the disk, but it is probably a bad connection. I am concerned about the parity corrections though, since there shouldn't have been any parity errors if you cleared disk7 before adding it. You didn't set a New Config to add the disks did you? Incorrect parity corrections would compromise rebuilding disk6. Check connections and try to get another diagnostic that shows SMART for original disk6 WDC_WD60EFRX-68MYMN1_WD-WX91D65DCDSN
February 28, 20179 yr Community Expert Looks like the typical SAS2LP issue, disk looks fine, try disabling VD-t id not needed and/or look for a board bios update.
February 28, 20179 yr Community Expert Just now, johnnie.black said: Looks like the typical SAS2LP issue, disk looks fine, try disabling VD-t id not needed and/or look for a board bios update. The original disk6 isn't reporting SMART. The replaced disk6 does looks fine but I think the rebuild may be compromised.
February 28, 20179 yr Community Expert Yes, the problems started much earlier, during a parity check, parity was incorrectly updated due to the errors, original disk6 should be fine, but confirm by getting a SMART report, rebuilt disk will have some corruptions, you should re-use the old disk, unless there's really a problem with it.
March 1, 20179 yr Author 1 hour ago, johnnie.black said: Yes, the problems started much earlier, during a parity check, parity was incorrectly updated due to the errors, original disk6 should be fine, but confirm by getting a SMART report, rebuilt disk will have some corruptions, you should re-use the old disk, unless there's really a problem with it. Thanks Johnnie. The parity sync / data rebuild just completed with the replacement drive sdg replacing the disabled sdi as Disk 6, and to answer trurl's questions no I did not do a new config either when I added Disk 7 + the unassigned spare or when I used the spare to replace the disabled Disk 6. When I added the new drives I simply stopped the array and did that from the main page (which prompted me to clear and and format Disk 7 before adding it to the array), then after the errors and Disk 6 being disabled during the immediate parity check after, I stopped the array again and assigned the spare as the new Disk 6 which prompted me to do the parity sync / data rebuild (I don't recall any prompt to clear the drive before that so I just assumed it wasn't necessary when adding a replacement drive). So what next? Now that the data rebuild has completed the main page is telling me that everything is good (parity valid and all drives green balled) but from what you and trurl are saying it sounds like the data on the new Disk 6 would have to be corrupted if it was restored from parity that was incorrect following the last aborted parity check, which means that the current parity would have to be incorrect now as well, yes? So is there any way to still ensure the data on the new Disk 6 is actually correct and that parity is good also (I hesitate to even attempt a new parity check at this point). Or do I have to go out to the server, check the connections to the original Disk 6 and assuming that was in fact the problem, do a new config to re-assign it back as Disk 6 and completely rebuild parity? If it has to be the latter then I'm guessing I'll also need to unassign Disk 7 + the new replacement Disk 6 so that parity can be restored under what had been the old config (ie. the last time I knew parity was really good). But then after that, do I do another new config to add Disk 7 back or do I just add it again from the main page and then run another parity check on top of that? As you can tell I'm a little confused now so a quick step by step would really help if you don't mind. BTW VT-d is disabled already and I'm pretty sure I'm running the latest BIOS too but I'll double check. Thanks to you and trurl for your help and let me know if there's anything else I'm missing. I'll hold off on doing anything else until I hear back.
March 1, 20179 yr Community Expert You should check SMART for the old disk6, if it's good I would do a new config with it and all other disks, problem is that the SAS2LP issue can re-occur again at any time, and if vt-d is already disable and using latest bios there's not much else you can do, try a different pcie slot if available, if problems persist consider replacing it.
March 1, 20179 yr Author 37 minutes ago, johnnie.black said: You should check SMART for the old disk6, if it's good I would do a new config with it and all other disks, problem is that the SAS2LP issue can re-occur again at any time, and if vt-d is already disable and using latest bios there's not much else you can do, try a different pcie slot if available, if problems persist consider replacing it. Thanks again Johnnie. Any recommendations on a new card if I wind up needing to replace it? The Supermicro AOC-SASLP-MV8 that I've been using in my backup server hasn't given me any problems so far. Have I just been lucky on that or are the issues really with this SAS2LP version only?
March 1, 20179 yr Community Expert I've been tracking this issue for some time and although it only affects a very small number of users, it seems to occur at similar rates for both the SASLP and SAS2LP, they use the same driver after all, so at the moment I would recommend an LSI based controller,
March 1, 20179 yr Author 12 hours ago, johnnie.black said: I've been tracking this issue for some time and although it only affects a very small number of users, it seems to occur at similar rates for both the SASLP and SAS2LP, they use the same driver after all, so at the moment I would recommend an LSI based controller, Got it. Thanks again. I think I'm good now but I'm going to keep the thread open until I can get out to the server on Friday. Assuming it was just a bad connection and/or that I can get the controller to behave again, I'll do the new config then come back and mark this solved. Fingers crossed.
March 1, 20179 yr Author On second thought I decided to buy an LSI card (a new 9211-8i already flashed to IT mode on eBay for $79). I'd been under the impression that most of the issues w/the SAS2LP were just slow parity checks which never effected me, but finally reading through some of those threads this morning and comparing it to my own posting history I'm now convinced that almost every issue I've had with unRAID over the past couple years (some of which have led to considerable data loss) are all tied directly to this buggy controller and that continuing to rely on it at this point is a textbook example of Einstein's definition of insanity. So happy birthday to me, I get a new sata controller. Meantime I'll go ahead and mark this solved now. Thanks again Johnnie and trurl for your help.
March 1, 20179 yr Author 4 minutes ago, ElJimador said: Meantime I'll go ahead and mark this solved now. Or I would have if I could find how to do that with the new format here. We no longer mark threads solved? Not a big deal of course. Just want to follow proper forum etiquette, whatever it is these days.
March 7, 20179 yr On 3/1/2017 at 11:58 AM, ElJimador said: Or I would have if I could find how to do that with the new format here. We no longer mark threads solved? I thought you just edited the title and added [SOLVED] in the front of it.
Archived
This topic is now archived and is closed to further replies.