fonzie Posted December 30, 2015 Share Posted December 30, 2015 I have been having some issues with my unRAID build these past few months and I've been pulling my hair out trying to resolve them. I've troubleshooted for months now and finally realized that I haven't used the greatest help possible, which is all you experts on this forum. So let me give you a little backstory. I had been running my original unRAID build for a few years with absolutely no problems but it was time for an upgrade because I wanted to use all the nice new features that unRAID 6 brought to the table (VM, dockers, etc) My OLD original underpowered hardware: Motherboard: ASUS M4A785-M CPU: AMD Sempron 2.8GHz RAM: 2GB Corsair XMS 675MHz (2x1GB sticks) SATA ADAPTER: Supermicro AOC-SAS2LP-MV8 SATA ADAPTER: SATA2 Serial ATA II PCI-Express (Silicon Image SIL3132) PSU: Corsair 650W Total of 10 drives (including parity and cache) I upgraded quite a few things in my box, namely the motherboard, ram, cpu, power supply, added two video cards and a new case. I guess the only original things I kept were the drives and the expansion cards and of course the flash drive. Shortly after, I started experiencing problems with drives. The first time it happened 2 drives "failed" on me so I thought I had data loss and replaced them. It happened again soon after and I realized the chances of that many hard drives failing were very unlikely. So I ran preclears on my two original "failed" drives on a separate computer and they passed... which confirmed my suspicion that it was a problem with some piece of hardware in my new unRAID build and not the drives themselves. I began to systematically test each hardware component to rule out issues. -tested the PSU and it is working properly and does have enough juice to supply power to my entire rig (feel free to cross check this as I may be wrong) -swapped out the two SAS to SAS 36-Pin cables and that seems to be working fine as well -swapped out and tested both the SAS RAID controller card and the SATA2 RAID controller card with my buddy who has two identical cards, and that doesn't seem to be the problem. I also moved them to different slots on the motherboard -changed some bios settings on the SAS controller to the same settings that my buddy has on his (he owns the same card and has no issues) *****One thing to mention is that I get an "Error PD device not ready" sometimes when booting up unRAID and I must press any key on the keyboard to continue booting up. I noticed that when this happens is the times when I usually have drive missing or disabled issues. This didn't use to happen with my old unRAID setup. Curiously enough, my friend gets the same "Error PD device not ready" notification sometimes, but he does not have any drive issues at all. (he does have a different motherboard and CPU though) *****The errors have occurred on different drive trays...so they are not isolated to the same slot every time. At this point, I'm thinking it may either be a motherboard issue (maybe the PCI slots cannot supply the full capacity to all the drives I have and the additional 2 gpu cards I have added) or maybe the backplanes on my new NORCO case are faulty?? Could it be a RAM issue or the Norco Reverse breakout cables? Here's my setup for reference. Maybe some keen eyes can find something that I overlooked or am not aware of: M/B: Gigabyte Technology Co., Ltd. - 990FXA-UD3 http://www.newegg.com/Product/Product.aspx?Item=N82E16813128514 CPU: AMD FX-8350 Eight-Core @ 4000 http://www.newegg.com/Product/Product.aspx?Item=N82E16819113284&cm_re=amd_8350-_-19-113-284-_-Product RAM: 16384 MB (max. installable capacity 32 GB) http://www.newegg.com/Product/Product.aspx?Item=N82E16820148540 GPU1: SAPPHIRE Radeon HD 4830 DirectX 10.1 100265L 512MB 256-Bit GDDR3 PCI Express 2.0 x16 http://www.newegg.com/Product/Product.aspx?Item=N82E16814102822 GPU2: EVGA 02G-P4-3658-KR GeForce GTX 650 Ti BOOST SuperClocked 2GB 192-bit GDDR5 PCI Express 3.0 http://www.newegg.com/Product/Product.aspx?Item=N82E16814130910 SATA ADAPTER: Supermicro AOC-SAS2LP-MV8 Add-on Card, 8-Channel SAS/SATA Adapter with 600MB/s per Channel http://www.amazon.com/Supermicro-AOC-SAS2LP-MV8-8-Channel-Adapter-Channel/dp/B005B0Z2I4/ref=sr_1_10?ie=UTF8&qid=1451497676&sr=8-10&keywords=sas+card SATA ADAPTER: SATA2 Serial ATA II PCI-Express RAID Controller Card (Silicon Image SIL3132) http://www.monoprice.com/product?p_id=2530 CABLES: 2x Norco C-SFF8087-4S Discrete to SFF-8087 Reverse Breakout Cable http://www.amazon.com/Norco-C-SFF8087-4S-Discrete-SFF-8087-Breakout/dp/B002MK7F0Y/ref=sr_1_1?ie=UTF8&qid=1451505475&sr=8-1&keywords=reverse+breakout+cable+norco CABLES: 2x 1m 30AWG Internal Mini SAS 36-Pin SFF-8087 Male to Mini SAS 36-Pin SFF-8087 Male Cable http://www.amazon.com/gp/product/B008VLHOR2?psc=1&redirect=true&ref_=oh_aui_detailpage_o00_s01 PSU: Corsair RM750 http://www.newegg.com/Product/Product.aspx?Item=N82E16817139055&cm_re=corsair_rm750-_-17-139-055-_-Product Case: NORCO 4224 http://www.amazon.com/NORCO-Mount-Hot-Swappable-Server-RPC-4224/dp/B00BQY3916/ref=sr_1_1?s=pc&ie=UTF8&qid=1451507089&sr=1-1&keywords=norco+4224 Total of 12 Drives (including Parity and Cache) link to full size image: http://i.imgur.com/YrWzxaH.jpg I just got a missing drive error this morning when I started doing a parity check so I shut it down and came here for help. I can supply additional information for each additional drive in my array if that would be helpful in determining my problem. Just let me know. thanks. Link to comment
JorgeB Posted December 30, 2015 Share Posted December 30, 2015 There have been a few issues with the SAS2LP and V6, it only affects a few users, were all the disks that failed connected to the SAS2? Link to comment
fonzie Posted December 30, 2015 Author Share Posted December 30, 2015 I'm not sure. I've had lots of drive failures in different drives, so I would just go to tools-->new config and take note of the drive order. restart it and set up the drives in the same order again. What I'll do right now is change the two SAS cables from the top two back planes to the bottom ones and see if there is an issue. I'll get back to you Link to comment
JorgeB Posted December 30, 2015 Share Posted December 30, 2015 If another drive fails go to tools > diagnostics and post complete zip, that can help in diagnosing the issue, it will also show what controller was used for the disk. Link to comment
fonzie Posted December 30, 2015 Author Share Posted December 30, 2015 That is the drive that failed. I just swapped the cables that were connected on the backplane of the Norco. I moved the top ones that were connected to the SATA2 Serial ATA II PCI-Express and connected them where the SAS2LP were. I restarted twice and haven't seen the "Error PD Device Not Ready" notification. I'm going to set a new config so that I can have all my drives functioning again...and then I will wait for another drive to give me an error. Once it does, I will do as you suggested and post a complete zip from the diagnostics page. Link to comment
fonzie Posted December 31, 2015 Author Share Posted December 31, 2015 Okay, so I just had another drive failure. I went to tools-->diagnostics and saved the zip file. My concern is that the zip contains passwords and information from some of my dockers. I want to post the zip as soon as possible so can someone please tell me if it is safe to do so, or which files I should exclude from the zip. thanks. Link to comment
fonzie Posted January 2, 2016 Author Share Posted January 2, 2016 After doing more research, I think my Gigabyte motherboard might be causing an issue. It has dual bios and I'm pretty sure it has HPA. I've had it running like this for a few months now, so I don't know how much damage I've done to all my hard drives with those hidden partitions. I'm going to swap out the new gigabyte motherboard with my original asus motherboard that gave me no errors.' If that turns out to be the problem, how do I reverse the damage that has been done to my hard drives by the HPA? for example, how do I narrow down which drives were affected? Link to comment
JorgeB Posted January 2, 2016 Share Posted January 2, 2016 You can search your syslog to check for HPA, if present it will appear like this: ata1.00: HPA detected: current 3907027055, native 3907029168 ata1.00: ATA-8: SAMSUNG HD204UI, S2HFJ1BZ902507, 1AQ10001, max UDMA/133 Link to comment
fonzie Posted January 2, 2016 Author Share Posted January 2, 2016 Yep. This came out in my syslog: Dec 30 13:15:35 media kernel: sas: Enter sas_scsi_recover_host busy: 0 failed: 0 Dec 30 13:15:35 media kernel: sas: ata11: end_device-1:0: dev error handler Dec 30 13:15:35 media kernel: sas: ata12: end_device-1:1: dev error handler Dec 30 13:15:35 media kernel: sas: ata13: end_device-1:2: dev error handler Dec 30 13:15:35 media kernel: sas: ata14: end_device-1:3: dev error handler Dec 30 13:15:35 media kernel: sas: ata15: end_device-1:4: dev error handler Dec 30 13:15:35 media kernel: ata15.00: HPA detected: current 625140335, native 625142448 So what steps do I take now to rectify the problem? Obviously I will be swapping out the motherboard. But I want to preserve all my data and avoid any corruption if possible. Link to comment
Squid Posted January 3, 2016 Share Posted January 3, 2016 All the dev error handler lines are just informational. They are not errors. The only issue (and its not a real issue unless the drive in question is your parity drive) is hpa. Plenty of threads around here in how to remove it so that you get that 16k of storage back. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.