October 31, 201015 yr Hello, I just started a parity sync on my new unRAID box and it says its going to take 181 days. I've no idea how to troubleshoot this, ive read about cables and checking DMA, but wondered if anyone can see anything obvious in the attached syslog? Jon syslog.zip
October 31, 201015 yr The initial time estimate to calculate parity is very incorrect. Pressing "refresh" might get you an updated estimate but I'm not sure it will on the 5.0beta. Joe L.
October 31, 201015 yr Author Hi Joe L This morning it had stopped. In my syslog I have quite a few drive issues it would appear, so hopefully preclear will weed them out. I refreshed several times over the first hour and the length of time was going up not down.... Any pointers looking at my syslog would be appreciated. My attention is on /dev/sdc at the moment Jon
October 31, 201015 yr all of your disks were reporting errors. pre-clearing will not do anything to fix disk errors. Your disk configuration is Oct 31 00:28:38 Tower kernel: md: import disk0: [8,64] (sde) WDC WD2001FASS-0 WD-WMAY00475665 offset: 63 size: 1953514552 Oct 31 00:28:38 Tower kernel: md: import disk1: [8,0] (sda) WDC WD20EADS-32S WD-WCAVY1047378 offset: 63 size: 1953514552 Oct 31 00:28:38 Tower kernel: md: import disk2: [8,16] (sdb) WDC WD20EADS-00R WD-WCAVY0067098 offset: 63 size: 1953514552 Oct 31 00:28:38 Tower kernel: md: import disk3: [8,32] (sdc) WDC WD20EADS-00R WD-WCAVY0282600 offset: 63 size: 1953514552 The syslog shows ata1, 2, 3 and 4. All timing out, all continually being reset in an attempt to re-establish communications with the disks. ata1 = /dev/sda, tat2 = /dev/sdb, ata3 = /dev/sdc, and ata4 = /dev/sdd. The only disk not showing errors is your parity disk /dev/sde Oct 31 00:30:13 Tower kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Oct 31 00:30:13 Tower kernel: ata1.00: failed command: READ DMA EXT Oct 31 00:30:13 Tower kernel: ata1.00: cmd 25/00:00:8f:71:00/00:04:00:00:00/e0 tag 0 dma 524288 in Oct 31 00:30:13 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 31 00:30:13 Tower kernel: ata1.00: status: { DRDY } Oct 31 00:30:13 Tower kernel: ata1: hard resetting link Oct 31 00:30:13 Tower kernel: ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Oct 31 00:30:13 Tower kernel: ata2.00: failed command: READ DMA EXT Oct 31 00:30:13 Tower kernel: ata2.00: cmd 25/00:00:57:76:00/00:04:00:00:00/e0 tag 0 dma 524288 in Oct 31 00:30:13 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 31 00:30:13 Tower kernel: ata2.00: status: { DRDY } Oct 31 00:30:13 Tower kernel: ata2: hard resetting link Oct 31 00:30:13 Tower kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Oct 31 00:30:13 Tower kernel: ata3.00: failed command: READ DMA Oct 31 00:30:13 Tower kernel: ata3.00: cmd c8/00:c8:8f:75:00/00:00:00:00:00/e0 tag 0 dma 102400 in Oct 31 00:30:13 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 31 00:30:13 Tower kernel: ata3.00: status: { DRDY } Oct 31 00:30:13 Tower kernel: ata3: hard resetting link Oct 31 00:30:13 Tower kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Oct 31 00:30:13 Tower kernel: ata4.00: failed command: READ DMA Oct 31 00:30:13 Tower kernel: ata4.00: cmd c8/00:c8:8f:75:00/00:00:00:00:00/e0 tag 0 dma 102400 in Oct 31 00:30:13 Tower kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Oct 31 00:30:13 Tower kernel: ata4.00: status: { DRDY } Oct 31 00:30:13 Tower kernel: ata4: hard resetting link
October 31, 201015 yr Author Joe L This is bad news. They are all relatively new WD Green drives. They have been in my WHS for about a year. How should I go about addressing this - from what I've read it could be one of many things 1) bad controller 2) bad cables 3) bad power cable/supply 4) bad drive From what your seeing, what would you suggest is the most likely culprit? Will unRAID work properly under these conditions - what would your next move be? Jon
October 31, 201015 yr Joe L This is bad news. They are all relatively new WD Green drives. They have been in my WHS for about a year. How should I go about addressing this - from what I've read it could be one of many things 1) bad controller 2) bad cables 3) bad power cable/supply 4) bad drive From what your seeing, what would you suggest is the most likely culprit? Will unRAID work properly under these conditions - what would your next move be? Jon unlikely to be a specific bad disk, but one could affect others on the same disk controller if it was mis-behaving. I see you have a lot more disks not yet assigned. What power supply are you using? Is it sized enough for all the disks? The fact that is is 4 disks might point to it being the disk controller... again just theory, as I have no idea where things might be connected. It could be as simple as the disk controller card (if it is a card) not being seated well in the connector.
October 31, 201015 yr Author Ok thanks. So, to trouble shoot this a good approach would be.... 1) Check seating of controller card, check cables are seated properly. 2) Disconnect all drives and add one at a time 3) Preclear each disk and add to the array one at a time 4) this should help me determine if the issue is a bad disk or a bad controller? I've just seen the AOC-SASLP-MV8 PCI Express x4 card on NewEgg. Currently I am using the AOC-SAT2-MV8 which was 2nd hand on ebay, so may change the controller anyway. I see unRAID recommends the AOC-SASLP-MV8 PCI Express x4 Jon
November 1, 201015 yr I am no MASTER... but here are my 2 cents... * Go to WD website and download their tool.... Do a ZERO fill... ( This tool will let you know if there are errors... It takes a long time so you might want to leave it for last (or NOT) * Make sure that your power supply can handle the load of the four drives... (4 drives at at SPIN-UP should equal 4amps *3.5 * 4 = 56amps at peak...)(Get a good & stable PSUnit.) * Distribute the load of the drives in different rails of the Power Supply unit. (Make sure you do not start adding extra extensions to ONE rail...) * As was said before..., verify seating of the cables and connections... * At boot, make sure that the drives are recognized correctly by the BIOS and also make sure that all settings within the BIOS are correct for the drives. * Make sure DMA transfers are enabled in the BIOS (READ, READ, READ) * Also, make sure that the drives options (such as caching and speed ) are not turned off by switches or software... ( remember MS was around.) * Make sure that the controller card firmware is up to date to handle those drives... ( WD Green's are new stuff ) If this is not enough... *Take every drive by itself and do a pre-installation of UnRaid, by itself in another system. If everything goes well, you have your killer; if not you also got your killer... This last one should point you to a bad MB or controller card or anything specific to your setup. Remember that you will not damage any other system you test your drives on since you will be booting from FLASH. Go test somewhere else to verify for faulty HW. The pointers I give you simply look for things that one might oversee. Good luck... I'll keep reading your post...
November 1, 201015 yr I am no MASTER... but here are my 2 cents... * Distribute the load of the drives in different rails of the Power Supply unit. (Make sure you do not start adding extra extensions to ONE rail...) This is almost never possible unless you re-wire a multi-rail power supply yourself. I've never seen a power supply where the disk molex power supply connectors are on more than one rail. In fact, they often share the same rail as the motherboard. The second and subsequent rails are usually reserved for the PCIe power connectors used for power hungry video cards. Here is an example of a power supply I purchased that suited MY needs. It is a 4 rail supply. The ONLY rail used for all the SATA and MOLEX power connectors also powers the motherboard. Without re-wiring it, the additional capacity of the drive is not available for disk drives. See this post for details http://lime-technology.com/forum/index.php?topic=6879.msg66778#msg66778 Most power supply manufacturers do not provide the detail of how they distribute multi-rail 12 Volt power to their connectors. (I have no idea why, but they do not) The OCZ supply I purchased did describe how they used the 4 12 volt rails. Eye-opening , isn't it. Their target market is not somebody with a lot of disk drives... it is a hard-core gamer with power hungry video cards.
November 1, 201015 yr Great detail Joe L.... which stresses the point of being very careful when determining your PS unit. I have had many issues with equipment being under_performing because of a badly designed power distribution and/or calculation of needed power. All this info will make it easier for "nojstevens"
November 1, 201015 yr Author Morning Both, I am running a Corsair TX750W PSU inside a Norco 4220 case. Currently 5 SATA drives are connected and 1 SDD. The Norco has 2 connectors on each of its 5 planes. I have direct (unsplit) connecters on 3 of the 5 planes but have now run out. The plan was to split one to two to complete the power. I've used this PSU with 8 drives before on WHS. Current status is that i've ordered the AOC-SASLP-MV8 from NewEgg. Should be here tomorrow. The AOC SAT2 MV8 I am using now is a PCI-X card that is plugged into a PCI slot - and its 5 years old. The new card will go in PCI-E and i have two new SAS cables to connect straight through. So today I am pre-clearing my 2Tb WD Green and a 2Tb WD Black so I can hopefully add them direct to my new setup - but that pre-clear takes a while doesn't it.... So, if my understanding of your advice is correct, if I have a pre-cleared disk with no errors, a new PCI-E card and new cables, if it still doesn't work, the most likely culprit is PSU? I am an optimistic kind of person, so I am confident I will report success tomorrow. Thanks for your advice so far.... Jon
Archived
This topic is now archived and is closed to further replies.