chickensoup Posted June 3, 2019 Share Posted June 3, 2019 (edited) Help please I recently transplanted my hardware over to a new case and at the same time changed the board, cpu and memory. Initially I had a problem where one of the controller cards was showing all disks on boot but not in unraid (displaying as missing). I updated the BIOS on the board from F3 to F8 and re-seated the card and cables, which seemed to work. Everything booted up ok and all disks went green. I allocated a new/spare disk as a cache drive, which formatted OK and i had green lights across the board, no errors. A few hours later after some light plex use, Parity 2 drops out first (red X). I figure maybe the cable is bad and I'm dealing with dinner/son/etc so leave it for the moment. Shortly there after when I get a chance to check it, the shares have dropped off and I have read errors across most of the disks. I'm wondering if the BIOS is setup slightly differently on the new board (legacy, ide mode, etc) so tomorrow afternoon i'll compare against the old board but I'm mostly at a loss as to what is going on. Motherboard changed from a Gigabyte GA-H57M-USB3 (rev 2.0) to a Gigabyte GA-H67MA-USB3-B3 (rev 1.0) and I ran a few passes of memtest on the new board without any issues yesterday. Edit: I unassigned Parity 2 since it was 'dropped' and after a reboot and replacing a couple of SATA cables (both parity drives) it was looking OK but then after today, errors all over the place again. Syslog is from the first time it failed, diagnostic is from after the reboot. Now disk 8 has dropped completely, I have no idea what is going on. Added an extra screenshot. unraid-syslog-20190603-1428.zip unraid-diagnostics-20190607-0945.zip Edited June 16, 2019 by chickensoup Solved. Quote Link to comment
JorgeB Posted June 3, 2019 Share Posted June 3, 2019 Please post the diagnostics: Tools -> Diagnostics Quote Link to comment
Frank1940 Posted June 3, 2019 Share Posted June 3, 2019 And what card are you using for the additional SATA ports beyond the MB ones? Quote Link to comment
chickensoup Posted June 3, 2019 Author Share Posted June 3, 2019 I'm using two Adaptec 1430SA's, the thing is that Parity 1 and Parity 2 are on the same card, one has errors and one doesn't. Drives on the second card have errors and so do ones on the motherboard, but some don't... Quote Link to comment
chickensoup Posted June 7, 2019 Author Share Posted June 7, 2019 Sorry for the late reply, been really busy with work. I've updated the OP with diagnostic after booting the server back up last night, it looked OK initially and I ran a non-correct parity check overnight, in the morning all disks were OK at about 30% with 8 errors detected so I stopped the check and changed to a correcting parity check, which I now regret- I'm hoping the data isn't corrupt. This afternoon looks just as bad as the other day only with different disks. Please note that between the two screenshots/reboots I also tidied up the cabling so the specific disks aren't necessarily on the same ports as they were the first time. Apologies if this makes things a little messier to diagnose but the logs should clear up any confusion. Quote Link to comment
JorgeB Posted June 7, 2019 Share Posted June 7, 2019 Multiple SATA links are going down, in multiple controllers, my first guess would be a power problem. Also change the onboard SATA controller to AHCI mode. Quote Link to comment
chickensoup Posted June 7, 2019 Author Share Posted June 7, 2019 (edited) Will change the onboard SATA controller to AHCI tonight, anything else I should look at before I reboot it? It is still currently powered on. Not to over-complicate things but in full disclosure, the system is actually running off two power supplies, for no reason other than that the case supports them and I was testing power usage balancing the load between the two. Based on what you have said I suspect my TT 750 might be playing up, which is strange since it is actually powering less than it has been for the last few months. I tested both the other night after it first failed and they looked OK but I might try swapping them around to see if this fixes anything. More info here > Edited June 7, 2019 by chickensoup More info Quote Link to comment
JorgeB Posted June 7, 2019 Share Posted June 7, 2019 42 minutes ago, chickensoup said: anything else I should look at before I reboot it? Nothing else that comes to mind. Quote Link to comment
chickensoup Posted June 9, 2019 Author Share Posted June 9, 2019 (edited) I actually tested both the power supplies before rebuilding the server and they looked OK, even under load but it's curious that other than Parity 2 (which dropped due to a SMART 199, which could be the SATA cable) all the other power supplies with errors are on the same PSU. Disks 10, 11, Cache and Parity 1 are all on a different PSU and show no errors. I have another power supply I can use but now I'm not really sure about how to best proceed with my disks having dropped out all over the place. i.e. - Parity 2 dropped out so I unassigned it for now - After the reboot when all looked well, I ran a correcting parity check which fixed ~10 errors but when I checked the server after it had finished there were read errors showing on all the data disks. I'm not sure if I can trust my parity is valid at this stage and I'm not sure when the errors started happening, can anyone tell from the diagnostic? - Disk 8 dropped out (also not sure if this was after the parity check) but SMART looks OK, I'm running a full check on it now Not sure if I should dump the data off disk 8 and rebuild it from parity as I feel like I actually trust the data disk more than the current parity state. Is there an option to reintroduce the disk to the array and rebuild parity off the data, assuming the disk is OK? Sorry if any of the above is confusing, just never had so many errors all at once, it's been rock solid up until now (going on 10 years..) Edit: Photo of the setup attached, if anyone is curious - disk 8 is missing as i'm running a WDDiag on it at the moment. Edited June 9, 2019 by chickensoup Photo added Quote Link to comment
JorgeB Posted June 9, 2019 Share Posted June 9, 2019 1 hour ago, chickensoup said: Is there an option to reintroduce the disk to the array and rebuild parity off the data, assuming the disk is OK? Yes, you can do a new config and re-sync parity. Quote Link to comment
Vr2Io Posted June 9, 2019 Share Posted June 9, 2019 (edited) I suspect the problem come from you use dual PSU and they are at poor DC ground. Suggest try make some connect of ground between 2 PSU, i.e. connect both PSU black wire to black wire by molx plug. Although the PSU sync plug and metal case already do that. There also another possible reason, one PSU just have several disk loading, if that PSU not implement DC to DC design, then due to low loading, voltage regulation may be out of range. What model of PSU for connect disk only ? And the mainboard change does pass memory test ? Edited June 9, 2019 by Benson Quote Link to comment
chickensoup Posted June 12, 2019 Author Share Posted June 12, 2019 (edited) Sorry for the long reply but I think I've finally worked out what has happened. It always felt like it was power related but the one thing I could never understand was why I was getting errors on some disks but not others- even when they were connected to the same chain off the same power supply cable. I thought at one point that maybe bending the cables in to shape to fit the case might have had some impact, since the power supplies are reasonably old (though good quality). It took a few days of thinking about the symptoms and scratching my head; the comment about poor ground also got me thinking and then while sifting through my power supplies and cabling I had a realization. I had 3 x Modular 6-Pin to SATA cables connected to my ToughPower 750W power supply. Turns out, the power supply likely only shipped with two of these and the one additional cable must be from a different PSU with a slightly different pin-out (I'm pretty sure there is no damage to the drives). My guess is the drives had 12V and Ground but that the 3.3V and 5V lines were swapped around. I feel like such an idiot. The picture below shows the two TT cables connected to the power supply (top row G, 12V, G on each) and two more modular SATA cables I had in my stash. If I had to put money on it, I would guess that the one on the right hand side in the picture was also being used which is why the symptoms were so strange. Close enough voltage to be OK for a little while, but ultimately not what the drive was looking for. Edit: All of the cable below are SATA to 6-Pin modular PSU cables, the one on the right in my hand at a distance, looks pretty much identical to the ones which shipped with the PSU. Edited June 12, 2019 by chickensoup Quote Link to comment
JorgeB Posted June 12, 2019 Share Posted June 12, 2019 Glad you found the problem, very lucky not to damage the disks. Quote Link to comment
chickensoup Posted June 12, 2019 Author Share Posted June 12, 2019 5 minutes ago, johnnie.black said: Glad you found the problem, very lucky not to damage the disks. Thanks! On 6/9/2019 at 7:05 PM, johnnie.black said: Yes, you can do a new config and re-sync parity. I have a quick question about re-building the array when I boot it back up. Assuming I trust the data on Disk 8, am I able to run a new config and re-build both the parity drives (at once) based on the data that is on the disks currently? Disk 8 is showing a red X (as per the second screenshot on my OP) but I don't think there is an issue with the drive at all, I've tested it outside of the array- I certainly don't trust my Parity right now but I want to re-introduce all the disks. Quote Link to comment
JorgeB Posted June 12, 2019 Share Posted June 12, 2019 11 minutes ago, chickensoup said: Assuming I trust the data on Disk 8, am I able to run a new config and re-build both the parity drives (at once) based on the data that is on the disks currently? Yes, new config will reset the disabled disk. Quote Link to comment
chickensoup Posted June 16, 2019 Author Share Posted June 16, 2019 Was able to generate a new config and rebuild Parity to both P disks without any issue, all disks are up and there doesn't appear to be any issue with the data. Thanks for all your help guys Marked as Solved. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.