Red Dot

mr-hexen · August 14, 2012

are those devices (ata16-21) on the same controller card?

Dan201 · August 14, 2012

How can I indentify which drive bays those are? I know which disk is in each drive bay by their serial number but no ata listings.

There are 5 drives in each cage so it could be a drive cage that's at fault as well as the sata card.

Joe L. · August 14, 2012

How can I indentify which drive bays those are? I know which disk is in each drive bay by their serial number but no ata listings.

There are 5 drives in each cage so it could be a drive cage that's at fault as well as the sata card.

Unfortunately, there is no easy way. You can look in the syslog as the ata devces are assigned, as the disk ids are interspersed in the syslog as well.

Dan201 · August 14, 2012

Ive taken a look through pages of text from mmy old system log and my current one. I can see ata7 mentioned as well as a few others but no mention of serial numbers, just drive model. My drives are all the same model.

Are there any plans to develop an interface to deal with errors outside of huge amounts of text? It sounds like this fault isn't unraids fault but unraid makes it incredibly hard to find out. If it wasn't for these forums I would have no idea where to start at all.

Dan201 · August 14, 2012

Is there some way to get a list of the drives in ata allocation? That way I could disconnect various drives to determine which drives are having the problem.

Dan201 · August 17, 2012

Anyone?

Joe L. · August 17, 2012

Is there some way to get a list of the drives in ata allocation? That way I could disconnect various drives to determine which drives are having the problem.

It depends on the version of unRAID you are running. I posted this exact answer for newer 5.0beta releases yesterday. There is no easy way for the 4.7 or earlier,other than to examine the proximity of the messages in the syslog to the device assignments.

It is here: http://lime-technology.com/forum/index.php?topic=21493.msg195355;topicseen#msg195355

Joe L.

Dan201 · August 18, 2012

Thank you for replying.

I have entered the command and attached the result. It lists the drives with their 'sd' allocation but I can't see how that corresponds to the ata allocation from the previous result. What am I missing?

Kind regards

Joe L. · August 18, 2012

On my server, running version "5.0-rc6-r8168-test2" it looks like this:

root@Tower2:/dev/block# ls -l /sys/block/[hs]d* | sed -e 's^.*/sys/block^/^' -e 's^/^ ^g' -e 's^\.\. devices ^^'

sda -> pci0000:00 0000:00:1e.0 0000:05:00.0 ata1 host0 target0:0:0 0:0:0:0 block sda

sdb -> pci0000:00 0000:00:1e.0 0000:05:00.0 ata2 host1 target1:0:0 1:0:0:0 block sdb

sdc -> pci0000:00 0000:00:1f.2 ata3 host2 target2:0:0 2:0:0:0 block sdc

sdd -> pci0000:00 0000:00:1f.2 ata4 host3 target3:0:0 3:0:0:0 block sdd

sde -> pci0000:00 0000:00:1d.7 usb2 2-4 2-4:1.0 host8 target8:0:0 8:0:0:0 block sde

sdf -> pci0000:00 0000:00:1f.2 ata5 host4 target4:0:0 4:0:0:0 block sdf

sdg -> pci0000:00 0000:00:1f.2 ata6 host5 target5:0:0 5:0:0:0 block sdg

sdh -> pci0000:00 0000:00:1f.2 ata7 host6 target6:0:0 6:0:0:0 block sdh

sdi -> pci0000:00 0000:00:1f.2 ata8 host7 target7:0:0 7:0:0:0 block sdi

sdj -> pci0000:00 0000:00:01.0 0000:01:00.0 ata10 host10 target10:0:0 10:0:0:0 block sdj

sdk -> pci0000:00 0000:00:1c.4 0000:03:00.0 ata11 host11 target11:0:0 11:0:0:0 block sdk

sdl -> pci0000:00 0000:00:1c.4 0000:03:00.0 ata12 host12 target12:0:0 12:0:0:0 block sdl

It is very easy to see the mapping here on my newer server.

Joe L. · August 18, 2012

What does it look like when you type:

ls -l /sys/block/sd*

Dan201 · August 18, 2012

Im running RC5. Here are the other results.

Joe L. · August 18, 2012

Im running RC5. Here are the other results.

apparently rc5 uses a much older kernel version.

On my server:

root@Tower2:/dev/block# uname -r

3.4.4-unRAID

You seem to be on 3.0.35-unRAID.

Oh well, I tried.

Joe L.

Dan201 · August 18, 2012

Thank you for your help. At least I know how to track down the problem drives now, I just need to get hold of a newer version.

No parity protection until I get it

Dan201 · August 19, 2012

Do you know if the previous versions of the RC used a newer kernel? I could downgrade for a while to find out which drives are causing the problem.

mr-hexen · August 19, 2012

rc5 uses 3.0.35

lainie · August 19, 2012

Kernel and File Versions

Dan201 · August 20, 2012

Thank you.

Is there an expected release date for RC6?

Joe L. · August 20, 2012

Thank you.

Is there an expected release date for RC6?

yes... a bit over 1 month ago.

The first release of rc6 was July 10, 2012

http://lime-technology.com/forum/index.php?topic=21377.0

The second release of rc6 was July 21, 2012

http://lime-technology.com/forum/index.php?topic=21597.0

I suggest the second, even if you do not have the mptsas disk controllers, as it also contains other fixes.

Joe L.

Dan201 · August 20, 2012

Great! I was mistakenly looking in the downloads section for it.

I have re run the command to find out the ata allocations and I have only 1. The drive that is connected directly into the Mobo comes up as ata 2. All the others have port numbers instead which I assume it because they are connected to sata controller cards.

Looks like we've hit another wall. Am I really the first person to have a problem with drives that uses sata cards? How have others found out which drives have been the problem?

My server is split into 3 drive cages with 5 drives in each, plus a seperate parity drive.

5 drives are itendified in the sys log as having an issue of some sort which makes me think its a drive cage that is at fault.

Can I remove each cage 1 at a time and rerun the parity check and see which cage is causing the problem? At the moment the server crashes each time I run a parity check. If it runs successfully can I assume that the missing drive cage is the faulty one? and will I face any problems adding the other drives back into the array after?

Kind regards

JonathanM · August 20, 2012

You can NOT run a parity check with a missing drive, as there is nothing to check against. If you mean temporarily installing 1 drive at a time directly to the controller instead of installed in a cage, then that should work fine as a troubleshooting technique.

Dan201 · August 20, 2012

I need to eliminate a drive cage at a time. If I remove 5 drives will there be an option to rebuild the parity instead of check?

And when I re assign the removed drives they should just go back without being wiped, correct?

JonathanM · August 20, 2012

I think it would be a better test to move all 5 drives to directly connect to the controller and check that way. A parity build is a different operation than a parity check, and may not stress the system in the same way.

Dan201 · August 21, 2012

To try and find the problem I removed 1 drive cage at a time and rebuilt the parity. Each time it became unresponsive and required a hard reboot. This makes me think the problem isn't with the cages.

I then replaced the parity drive with a spare one. Again it failed.

Next I assigned the parity drive to a different bay and im currently rebuilding the parity. Its at 3% so far, further than Ive gotten in a long time.

So far around 1.2Tb of data has gone missing from the drives with hundreds of thousands of errors now on disk 10.

I really wish I had stuck to WHS.

Dan201 · August 21, 2012

The parity build failed again.

At the moment the best I can do is start the array with no parity attempt. That way I can access my files at least. Well, the ones that are left anyway.

EDIT; The missing data has reappeared.

Dan201 · August 21, 2012

Would it be worth changing controller cards? I could probably get new ones and test them and return them if its no difference.

Any recommendations?

Red Dot

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Archived