mr-hexen Posted August 14, 2012 Share Posted August 14, 2012 are those devices (ata16-21) on the same controller card? Link to comment
Dan201 Posted August 14, 2012 Author Share Posted August 14, 2012 How can I indentify which drive bays those are? I know which disk is in each drive bay by their serial number but no ata listings. There are 5 drives in each cage so it could be a drive cage that's at fault as well as the sata card. Link to comment
Joe L. Posted August 14, 2012 Share Posted August 14, 2012 How can I indentify which drive bays those are? I know which disk is in each drive bay by their serial number but no ata listings. There are 5 drives in each cage so it could be a drive cage that's at fault as well as the sata card. Unfortunately, there is no easy way. You can look in the syslog as the ata devces are assigned, as the disk ids are interspersed in the syslog as well. Link to comment
Dan201 Posted August 14, 2012 Author Share Posted August 14, 2012 Ive taken a look through pages of text from mmy old system log and my current one. I can see ata7 mentioned as well as a few others but no mention of serial numbers, just drive model. My drives are all the same model. Are there any plans to develop an interface to deal with errors outside of huge amounts of text? It sounds like this fault isn't unraids fault but unraid makes it incredibly hard to find out. If it wasn't for these forums I would have no idea where to start at all. Link to comment
Dan201 Posted August 14, 2012 Author Share Posted August 14, 2012 Is there some way to get a list of the drives in ata allocation? That way I could disconnect various drives to determine which drives are having the problem. Link to comment
Joe L. Posted August 17, 2012 Share Posted August 17, 2012 Is there some way to get a list of the drives in ata allocation? That way I could disconnect various drives to determine which drives are having the problem. It depends on the version of unRAID you are running. I posted this exact answer for newer 5.0beta releases yesterday. There is no easy way for the 4.7 or earlier,other than to examine the proximity of the messages in the syslog to the device assignments. It is here: http://lime-technology.com/forum/index.php?topic=21493.msg195355;topicseen#msg195355 Joe L. Link to comment
Dan201 Posted August 18, 2012 Author Share Posted August 18, 2012 Thank you for replying. I have entered the command and attached the result. It lists the drives with their 'sd' allocation but I can't see how that corresponds to the ata allocation from the previous result. What am I missing? Kind regards Link to comment
Joe L. Posted August 18, 2012 Share Posted August 18, 2012 On my server, running version "5.0-rc6-r8168-test2" it looks like this: root@Tower2:/dev/block# ls -l /sys/block/[hs]d* | sed -e 's^.*/sys/block^/^' -e 's^/^ ^g' -e 's^\.\. devices ^^' sda -> pci0000:00 0000:00:1e.0 0000:05:00.0 ata1 host0 target0:0:0 0:0:0:0 block sda sdb -> pci0000:00 0000:00:1e.0 0000:05:00.0 ata2 host1 target1:0:0 1:0:0:0 block sdb sdc -> pci0000:00 0000:00:1f.2 ata3 host2 target2:0:0 2:0:0:0 block sdc sdd -> pci0000:00 0000:00:1f.2 ata4 host3 target3:0:0 3:0:0:0 block sdd sde -> pci0000:00 0000:00:1d.7 usb2 2-4 2-4:1.0 host8 target8:0:0 8:0:0:0 block sde sdf -> pci0000:00 0000:00:1f.2 ata5 host4 target4:0:0 4:0:0:0 block sdf sdg -> pci0000:00 0000:00:1f.2 ata6 host5 target5:0:0 5:0:0:0 block sdg sdh -> pci0000:00 0000:00:1f.2 ata7 host6 target6:0:0 6:0:0:0 block sdh sdi -> pci0000:00 0000:00:1f.2 ata8 host7 target7:0:0 7:0:0:0 block sdi sdj -> pci0000:00 0000:00:01.0 0000:01:00.0 ata10 host10 target10:0:0 10:0:0:0 block sdj sdk -> pci0000:00 0000:00:1c.4 0000:03:00.0 ata11 host11 target11:0:0 11:0:0:0 block sdk sdl -> pci0000:00 0000:00:1c.4 0000:03:00.0 ata12 host12 target12:0:0 12:0:0:0 block sdl It is very easy to see the mapping here on my newer server. Link to comment
Joe L. Posted August 18, 2012 Share Posted August 18, 2012 What does it look like when you type: ls -l /sys/block/sd* Link to comment
Dan201 Posted August 18, 2012 Author Share Posted August 18, 2012 Im running RC5. Here are the other results. Link to comment
Joe L. Posted August 18, 2012 Share Posted August 18, 2012 Im running RC5. Here are the other results. apparently rc5 uses a much older kernel version. On my server: root@Tower2:/dev/block# uname -r 3.4.4-unRAID You seem to be on 3.0.35-unRAID. Oh well, I tried. Joe L. Link to comment
Dan201 Posted August 18, 2012 Author Share Posted August 18, 2012 Thank you for your help. At least I know how to track down the problem drives now, I just need to get hold of a newer version. No parity protection until I get it Link to comment
Dan201 Posted August 19, 2012 Author Share Posted August 19, 2012 Do you know if the previous versions of the RC used a newer kernel? I could downgrade for a while to find out which drives are causing the problem. Link to comment
Dan201 Posted August 20, 2012 Author Share Posted August 20, 2012 Thank you. Is there an expected release date for RC6? Link to comment
Joe L. Posted August 20, 2012 Share Posted August 20, 2012 Thank you. Is there an expected release date for RC6? yes... a bit over 1 month ago. The first release of rc6 was July 10, 2012 http://lime-technology.com/forum/index.php?topic=21377.0 The second release of rc6 was July 21, 2012 http://lime-technology.com/forum/index.php?topic=21597.0 I suggest the second, even if you do not have the mptsas disk controllers, as it also contains other fixes. Joe L. Link to comment
Dan201 Posted August 20, 2012 Author Share Posted August 20, 2012 Great! I was mistakenly looking in the downloads section for it. I have re run the command to find out the ata allocations and I have only 1. The drive that is connected directly into the Mobo comes up as ata 2. All the others have port numbers instead which I assume it because they are connected to sata controller cards. Looks like we've hit another wall. Am I really the first person to have a problem with drives that uses sata cards? How have others found out which drives have been the problem? My server is split into 3 drive cages with 5 drives in each, plus a seperate parity drive. 5 drives are itendified in the sys log as having an issue of some sort which makes me think its a drive cage that is at fault. Can I remove each cage 1 at a time and rerun the parity check and see which cage is causing the problem? At the moment the server crashes each time I run a parity check. If it runs successfully can I assume that the missing drive cage is the faulty one? and will I face any problems adding the other drives back into the array after? Kind regards Link to comment
JonathanM Posted August 20, 2012 Share Posted August 20, 2012 You can NOT run a parity check with a missing drive, as there is nothing to check against. If you mean temporarily installing 1 drive at a time directly to the controller instead of installed in a cage, then that should work fine as a troubleshooting technique. Link to comment
Dan201 Posted August 20, 2012 Author Share Posted August 20, 2012 I need to eliminate a drive cage at a time. If I remove 5 drives will there be an option to rebuild the parity instead of check? And when I re assign the removed drives they should just go back without being wiped, correct? Link to comment
JonathanM Posted August 20, 2012 Share Posted August 20, 2012 I think it would be a better test to move all 5 drives to directly connect to the controller and check that way. A parity build is a different operation than a parity check, and may not stress the system in the same way. Link to comment
Dan201 Posted August 21, 2012 Author Share Posted August 21, 2012 To try and find the problem I removed 1 drive cage at a time and rebuilt the parity. Each time it became unresponsive and required a hard reboot. This makes me think the problem isn't with the cages. I then replaced the parity drive with a spare one. Again it failed. Next I assigned the parity drive to a different bay and im currently rebuilding the parity. Its at 3% so far, further than Ive gotten in a long time. So far around 1.2Tb of data has gone missing from the drives with hundreds of thousands of errors now on disk 10. I really wish I had stuck to WHS. Link to comment
Dan201 Posted August 21, 2012 Author Share Posted August 21, 2012 The parity build failed again. At the moment the best I can do is start the array with no parity attempt. That way I can access my files at least. Well, the ones that are left anyway. EDIT; The missing data has reappeared. Link to comment
Dan201 Posted August 21, 2012 Author Share Posted August 21, 2012 Would it be worth changing controller cards? I could probably get new ones and test them and return them if its no difference. Any recommendations? Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.