T800 Posted April 28, 2018 Share Posted April 28, 2018 PSU died, and fried 3 x 4TB HDD's! What now? I installed a new PSU. The server won't autoboot from the USB anymore I have to select it to boot from each time. I reset CMOS and still does it. Motherboard damaged? I tried the HDD's on other power and sata ports and get nothing. I tried on them in my other server to see if it will show up as an unassigned device. It's spins up but don't appear as a device. If the HDD's are completely dead how do I get the server back on with the 2nd parity drive and without the other 2 data drives. . Quote Link to comment
Squid Posted April 28, 2018 Share Posted April 28, 2018 1 minute ago, T800 said: I reset CMOS and still does it. Did you restore to factory defaults, or set the boot order to boot off of the USB stick first? 2 minutes ago, T800 said: 2nd parity drive and without the other 2 data drives You can't. You have lost a total of 3 drives, but dual parity allows you to lose a maximum of 2 drives. 1 Quote Link to comment
T800 Posted April 28, 2018 Author Share Posted April 28, 2018 So what do I do, I'm guessing I can keep my data on the drives that are left? Do I preclear the 2nd parity drive to use as single parity and then what? Quote Link to comment
BRiT Posted April 28, 2018 Share Posted April 28, 2018 If the data is important on the dead drives, you might be able to swap circuit boards on the drives to get them working again. Google/Bing about that. From what i recall, if its onky the electronics on the drive and not the internal ohysical media or drive heads, then circuit board replacment might work ... i think you need identical model drives for the doner boards. As for the drives that are left, I would at least perform SMART tests on them to make sure they didnt suffer as well before going forward. Quote Link to comment
trurl Posted April 28, 2018 Share Posted April 28, 2018 51 minutes ago, T800 said: So what do I do, I'm guessing I can keep my data on the drives that are left? Do I preclear the 2nd parity drive to use as single parity and then what? Don't preclear. Just New Config, assign your remaining data disks to whatever data slots you want them to use, assign old parity2 to parity slot. Don't check the box saying parity is already valid. Start array to begin parity sync. TL;DR No need to preclear unless you just want to test the disk. unRAID only requires a clear disk when you are adding a disk to a new slot in an array with valid parity. A clear disk is all zeros, which has no effect on the parity calculation, so when you add a clear disk to a new slot parity remains valid. And if you do add a disk to a new slot and this disk hasn't been cleared, unRAID will clear it before it adds it so parity remains valid. When replacing a disk, whether it is parity or data, there is no need to clear the replacement, since it is going to be completely overwritten from the parity calculation anyway. Quote Link to comment
T800 Posted April 28, 2018 Author Share Posted April 28, 2018 I tried swapping the PCB from another drive and attached to my other server and the drive shows up. It does have the serial of the drive the PCB came off though. I will check the other two later when I get home. Quote Link to comment
T800 Posted April 28, 2018 Author Share Posted April 28, 2018 When I put decent PCB's on the 3 HDD's they all appear as devices on the other server. They won't SMART test though and say: === START OF READ SMART DATA SECTION === Read SMART Log Directory failed: scsi error badly formed scsi parameters Read SMART Self-test Log failed: scsi error badly formed scsi parameters What should I do? The drives will mount but have issues. So far the existing drives in the server that failed are passing the SMART tests. Quote Link to comment
Vr2Io Posted April 28, 2018 Share Posted April 28, 2018 (edited) This is normal, you need bring a IC on PCB from org. one to new one. But if that IC also burn out, then nothing you can do. Edited April 28, 2018 by Benson Quote Link to comment
Frank1940 Posted April 29, 2018 Share Posted April 29, 2018 You could try mounting these disks using the Unassigned Devices plugin and see how much data you can copy off of these drives. Quote Link to comment
T800 Posted April 29, 2018 Author Share Posted April 29, 2018 I'm gonna get a PCB and try and save the drives. While I wait for that I have reset 2nd parity as single parity and started a new config with the raining drives. I got quite a few udma urc error counts on drive 6. It's currently on udma crc error count is 5778 Should I stop parity, check connections or just let it finish. Parity write is slow at 9MB/sec Quote Link to comment
Frank1940 Posted April 29, 2018 Share Posted April 29, 2018 This is a tough one to answer at this point. I am assuming that the count is continuing to increase. What a UDMA CRC error is an error in the transmission of the data from the HD to SATA controller on either the MB or an SATA card. Usually these errors are caused by a bad SATA cable or a bad connection. (Occasionally, Crosstalk can be an issue...) But in your case, with the (most likely, over-voltage) problem, there could be more damage to the electronics than was first realized. I would suggest uploading the Diagnostics file and see if anyone can spot anything there. At this point, you might want to take a deep breath and begin to analyze the entire situation. Is this a 'new server' or have you been using it for a few years? Have you been considering doing any upgrades to it? What is your risk tolerance as this system has been severely stressed and, possibly, had its life (and reliability) reduced? Quote Link to comment
John_M Posted April 29, 2018 Share Posted April 29, 2018 Personally, I'd stop and investigate. At 9 MB/s you can't be very far into the rebuild and at that rate it will take days to complete if you just leave it. What was the model of the power supply that caused so much damage? I would hope that a quality brand would be capable of failing gracefully. Quote Link to comment
T800 Posted April 29, 2018 Author Share Posted April 29, 2018 It is increasing. 6 days until complete. I've stopped the array now. It's nearly 7 years old. The PSU was a Corsair CX430. It's now got a new CX450 in it. I am thinking about replacing motherboard and all cables. I guess that would be a good idea and then what, new config and set it off again? I've attached a diagnostics. tower-diagnostics-20180429-1538.zip Quote Link to comment
John_M Posted April 29, 2018 Share Posted April 29, 2018 (edited) 6 hours ago, T800 said: I got quite a few udma urc error counts on drive 6. Yes. That's your problem. The SATA link keeps being reset every few seconds and can't achieve a speed faster than 1.5 Gb/s. First thing is to try a new cable. Apart from that your motherboard appears to be working well enough. Edited April 29, 2018 by John_M Quote Link to comment
T800 Posted April 30, 2018 Author Share Posted April 30, 2018 I swapped the cable and now it's doing parity in 14 hours at 80MB/sec, so far so good. Quote Link to comment
T800 Posted April 30, 2018 Author Share Posted April 30, 2018 Parity finished without errors. I'll try and fix the 3 fried HDD's when the PCBs turn up. Thanks for all the help everyone. Quote Link to comment
John_M Posted April 30, 2018 Share Posted April 30, 2018 Now that you've built parity you ought to run a parity check to make sure it's ok - or just leave it if it's set to do an automatic check on the first of each month. Quote Link to comment
T800 Posted May 1, 2018 Author Share Posted May 1, 2018 Okay thanks, will do. I actually stopped the monthly check this morning because it completed yesterday, didn't realize it was worth running again. I ran the "Fix Common Problems" plugin and it's reporting: Your CPU is running constantly at 100% and will not throttle down when it's idle (to save heat / power). This is because there is currently no CPU Scaling Driver Installed. Seek assistance on the unRaid forums with this issue I never had this before the PSU took a swan dive (or could be from the 6.5.0 update a few days before the PSU went). So far still haven't been able to save USB as the main priority boot device either. Gonna try a firmware update (I noticed it's a couple of revisions old) on the motherboard and another USB port. If it doesn't sort it looks like I might have to get a motherboard after all. Quote Link to comment
John_M Posted May 1, 2018 Share Posted May 1, 2018 If you don't already have it install the Tips and Tweaks plugin. You can select a CPU scaling governor there. I use the On Demand one which allows maximum performance when it's needed but slows down the clock when the server is idle. Quote Link to comment
T800 Posted May 4, 2018 Author Share Posted May 4, 2018 The parity check went fine. The server has been working great since. I installed Tips and Tweaks and set to On Demand. It still comes up as an error in fix common problems but the CPU shows it not at 100% and the server sleeps now which it hadn't for a few months. Quote Link to comment
T800 Posted June 20, 2018 Author Share Posted June 20, 2018 (edited) Just a quick update on this. I ordered 3 donor PCB boards from China. Two weeks later they arrived, I got a half decent heat gun with a fine tip and swapped the BIOS chips over. One of the parity drives and one of the data drives came back to life. I figured, the 3rd PCB maybe be faulty so I told them and they sent another. It arrived on Monday, swapped BIOS chip and viola it's back up and running. I'm back to where I was data-wise and with 2 parity drives and just finished my 2nd parity check with 0 errors. The only thing left is to get the server to boot from USB by itself. It's fine if I restart but if I shutdown and boot I have to go to the boot menu and select USB. Even if I can't sort that i can live with it for now. Short version, I've managed to not lose any data. Edited June 20, 2018 by T800 4 1 Quote Link to comment
Vr2Io Posted June 20, 2018 Share Posted June 20, 2018 (edited) Good job, 8TB data back. Edited June 20, 2018 by Benson Quote Link to comment
T800 Posted June 20, 2018 Author Share Posted June 20, 2018 Thanks, I hadn't even thought about it as getting 8TB back as stupid as that sounds, just that I'd saved the faulty drives. It's a lot! Quote Link to comment
ijuarez Posted June 20, 2018 Share Posted June 20, 2018 That's great that you got your data back, IMHO i start looking for a replacement MOBO, they fact that it will not let you boot off the stick and it fried 3 drives (sort of) to me its sending a signal that there's something wrong. Quote Link to comment
Mat1926 Posted June 20, 2018 Share Posted June 20, 2018 8 hours ago, T800 said: Just a quick update on this. I ordered 3 donor PCB boards from China. I am glad that you recovered your data, would you pls give us the link to the online store? Thnx Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.