Problems with Disk 4 exterme slowdown

September 19, 200817 yr

First time I had to ask for help with and unraid failure. I will try to make it short as this has been going on for about a hair-pulling week now. Pro version 4.3.3 (problem stated with 4.2). Had a disk fail (#7). Replaced with another disk and unraid stated rebuilding real real slow. Stopped the rebuild after much screwing around I decided I did not really need the data on this disk. Hit a restore parity started to build real real slow I mean 120KB slow. Checked the log and kept seeing the same thing that looked strange : ata4.01 status (DRDY) ata4: soft resetting link ata 4.00 configured for UDMA/133 ata4.01 configured for UDMA/33. Don't know what this mean, but UDMA/33 is slower than 133 so something bad with disk 4. Disk 3 and 4 were IDE drives 6 other drives are SATA. Unassigned disk 4 and did a restore. Still 120KB slowness. Figured I had a bad IDE controller P5B-VM Do motherboard BTW. Took off disk 3 so now no IDE drives and hit restore, boom 23000KB on the speed. Problem solved ordered a new SATA drive. New drive came in today installed the drive assigned to disk 4 formated no problems. Started a parity check, what do you know 120KB speed. Moved the drive to a different sata port on a PCI card with 2 other drives which work fine and assign it to disk 4, no luck 120KB. Leave drive where it is an assign it to disk 10 24000KB. It appears that If I assign a drive to disk 4 and I can only assume disk 3 the system does not work correctly. Those are not physical places in the hardware they are just places in software, right? During the whole troubleshooting process I upgraded to version 4.3.3 and at some point a read a post by the admin about doing a "fresh start" by deleting a file and changing another and starting over. None of these things fixed the problem. The only thing that seems to work is leaving disk 3 and 4 unassigned. Sorry no syslog. I will have to read up on how to save that to a file and upload it. I can view it on my monitor. But right now every thing is working until I want to add a disk 3 or 4 that is.

September 19, 200817 yr

It's unknown where device 4 is connected to.

Is it a promise pci or one of the onboard sata.

From you message the drive is not initializing correctly and/or there is a communication issue with the drive.

Double check the cable. reseat each end point. If it's a PCI card, reseat the card.

I've had drives go offline because they wiggled out of the socket due to vibration of the drive.

a syslog is needed to see how things are being set up.

September 19, 200817 yr

Author

Disk 4 was originally for months hooked to an onboard ide port. When things started going wrong I assigned disk 4 to a sata drive with its port on the motherboard then again to a sata drive with its port on a Masscool XWT-RC040 4 port pci sata card. In all cases when I would rebuild parity it would rebuild at a speed of about 120KB. When I assign the drive to disk 10, mind you I did not physically move the drive or any cables attached to it I just assigned it to a different disk, I would get a parity rebuild speed of 24000KB.

The drive does initialize correctly, but there definitely is a communication problem. If it is assigned to disk 4. Also this happens so far with 3 different drives. They work if assigned to anything other than disk 4 or 3.

I have checked and double checked then replaced the cables.

I will try to post a syslog tonight if I can get some time.

September 19, 200817 yr

I can only guess that this has something to do with the neighboring drive or controller on Device 2 or 3.

I.E. a device PRIOR to device 3 or 4's access is hogging the bus, timing out or retrying.

Consider this, when building parity each drive is read sequentially starting from Disk1 through Diskn

I would check all cables, reseating and if possible.

I would check all drive assignments on the cables if using IDE>

If you are using IDES with cable select, make sure single drives are on the LAST part of the cable.

Dangling end points on cables could be an issue with signal reflections. This has always been an issue with SCSI drives.

I've seen it only a few times with IDE drives.

if you had two drives on the same cable and removed one, then there may be an issue of master/slave assignments on the cable.

I've seen this affect drive speeds in the past too.

Sometimes I had to assign master to a drive even if it supported cable select.

I would do a smartctl -d ata -a on device 2 and 3 or ANY devices on the ide controllers.

September 19, 200817 yr

Author

Ok scratch what I said about it only doing it with a drive assigned in the disk 4 position.

When i got home I started a parity check and sure enough I am getting about 450KB on the speed. Last night with the same configuration I was getting 24000KB. Attached is a syslog. Any help would be much appreciated. I have no IDE drives in the system at this time. Before writing this I took WeeboTech's advice. I removed all sata cables and reattached them. None seemed to be loose. I them rebooted the system and started a parity check. 450KB on the speed.

September 20, 200817 yr

It's amazing how much a syslog can clarify the true situation! Can I gently and respectfully, to you and other users who may read this, stress the importance of obtaining a syslog at the first sign of trouble (see the Troubleshooting page). I suspect in your case, someone could have pointed to the real problem within a day of posting a syslog, and saved you all that time and effort and possible data loss on the drive you removed.

It is your parity drive that is causing trouble, and may have been the problem all along. The log messages you mention above are very similar to the messages in your syslog now, and are currently referring to the parity drive, sdi, a Seagate 750GB (serial ending in RKQ). The only difference is that it is currently associated with ata8.01, whereas before it was associated with ata4.01. The .01 indicates the second 'sub-channel', and that only occurs with a drive in the slave position of an IDE channel, or a SATA drive appearing to be in the second position in an IDE emulation, which your onboard ports appear to be in. I suspect you may have thought it was Disk 4 because of 'ata4', but unfortunately there is no relation with the ata numbers assigned by the kernel to hardware devices, and the disk numbers that we assign to the drives. Disk 4 could have been associated with ata1 to ata20-something. In this case, I believe ata4.01 was the parity drive, and ata4.00 was Disk 1. You mentioned this started with an earlier unRAID version. The kernel included with earlier versions generally assigned the onboard ports first, but the more recent ones seem to pick up the other controllers first, especially the JMB controller. That would probably explain the change from ata4 to ata8.

Your syslog only records about 8 minutes, but already it has slowed the parity drive down to UDMA/33 (as you mentioned above), and seems on its way to further slowing, probably to PIO speeds, ridiculously slow as you have seen. In addition, when it has been sent resets, the whole ata8 channel was reset, which included both sdi the parity drive, and sdh Disk 1. Although it did not slow Disk 1 down, each reset is a delay, so that's 2 drives being delayed periodically.

I don't actually know what is wrong with the parity drive, just that it is causing 'exception Emask' errors ('frozen' and 'timeout' variety), but in this short syslog is not reporting media or device errors. I would check a SMART report on it, and run tests on it.

The drives attached to the JMB SATA ports appear to be configured for the fastest performance, using AHCI, with the connected WD 500GB linking at 3.0Gbps, but the Seagate 750GB next to it is only linking up at 1.5Gbps, so I suspect that it still has its SATA150 limiting jumper installed. See the Improving unRAID Performance page for help with that, and check both Seagates for jumpers.

Your onboard SATA ports look to me to be in an IDE emulation mode, so check your BIOS menus for a native SATA mode, possibly an 'Enhanced' SATA, or best of all is an AHCI mode. See the Hardware Compatibility page for a couple of links to BIOS settings for your board. From earlier posts, I believe you have an ASUS P5B-VM DO board? Others with that board may be able to advise you also. Ideally, you want your fastest SATA drives to be hooked to ports with AHCI support, and linking up at 3.0 Gbps. That would be your JMB ports and your onboard ports once configured for AHCI and jumpers removed. You can't speed up the SATA drives connected to the 4 port SATA150 PCI card, a SiI3114-based card I believe.

Once you get the parity drive problem solved, and the BIOS config and jumper issues taken care of, you should be getting parity check speeds of 50000KB/s and higher, perhaps over 60000KB/s with all SATA drives. The IDE drives will drag it down a little, but it should still be much better than the 24000KB/s you are seeing now.

You don't indicate how you determined that the original Disk 7 had failed. Since I believe that this parity drive has been causing trouble for awhile, you may want to recheck your analysis of Disk 7, and see if the problems detected were actually the fault of this parity drive after all.

September 20, 200817 yr

It's amazing how much a syslog can clarify the true situation!

What is needed is a viewer on emhttp which shows the syslog and highlights questionable messages.

We seem to spend allot of time testing things out when, as you say, the syslog reveals issues.

September 21, 200817 yr

Author

Thanks so much for all the help. I am very sorry for not posting the syslog and I will always do so from now on. I replaced the 750gb parity drive with a 1tb Western Digital. Everything worked fine and I got the 24000KB speeds. I wanted to speedup my system so I read all the links provided. I moved my drives around as to get off the PCI card, which was going to limit the speed to 150 and I removed the jumper on my other Seagate drive. Started the system back up reassigned the drives to the correct places and started a parity check. Back to the 150KB speed on a parity check. If I am reading the syslog correctly it looks like it is ATA5 causing the problem, which is my other Seagate 750gig. Last time it was ATA8. Are ATA5 and ATA8 on the same controller? Where in the syslog can I tell this. Will the ATA numbers always assign to the same drive and controllers?

September 21, 200817 yr

Author

Status report.

Moved the Seagate 750 gig to a different sata port. I have 3 Athena drive cages so the moves are easy. Restarted the system, assigned the 750 gig to its proper disk # and started a parity check. Blazing Speeds of 44000kb. Alas all is not good though. The syslog reveals what appears to be lots of errors on ata 7 which is a brand new Hitachi 500 gig with nothing on it. Of course the cursed Seagate 750 is at ata8.

Maybe I should just go to bed tonight and quit messing around.

September 21, 200817 yr

This drive is still configured for 1.5..

Sep 20 20:03:34 Tower kernel: ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 300)

Sep 20 20:03:34 Tower kernel: ata6.00: ATA-7: MAXTOR STM3500630AS, 3.AAE, max UDMA/133

It does seem as though ata 5 is having trouble. Many resets and drops down to 1.5

and it happens pretty early on

Sep 20 20:12:38 Tower kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xa frozen

Then eventually drops

Sep 20 20:14:58 Tower kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

>> Will the ATA numbers always assign to the same drive and controllers?

No it depends on the order of how the bios detects them They usually stay the same.. However I've seen behavior on my machine where it shifts sometimes... but it's rare.

Also, might want to check the speed jumpers on that STM3500630AS

September 21, 200817 yr

I have 3 Athena drive cages so the moves are easy.

AHA!!! Now I'm getting a good picture.

The drives may not be seating in properly or tight enough.

Reseat them SLOWLY and carefully making sure full contact.

You might want to shine a bright light on the contacts to make sure there is no dust.

Shine a bright line inside the cage too.

Don't just slide them in, press them in gently but firmly to assure contact.

How long have you had these cages?

September 21, 200817 yr

Author

Had the cages since day 1 which was about Feb or March of this year. No problems since then tower was never touched except to press the on button after a power down for storms. Tower is on a ups, but if weather is going to be bad I do shut it off. Will give your suggestion a try tomarrow.

September 21, 200817 yr

I think WeeboTech has a very good idea. None of the errors reported seem like physical drive errors, they all look like communication errors, in other words, problems between the drive and the controller. Plus the fact that when you moved the Seagate 750GB, the problems did not move with it, but to a different drive. I would also be checking if the drive reporting errors each time is on the same slot of the same Athena.

Problems with Disk 4 exterme slowdown

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)