Treytor Posted January 21, 2011 Author Share Posted January 21, 2011 If nothing else work I will propose to use the "brute force" approach before reformat. You have Norco so the disks in questions are easy to remove. Attach them (one at a time) to a Windows computer and use Yareg - http://yareg.akucom.de/ Very interesting "news" there from November 7th, 2010 - apparently some people are using it for "data-recovery" of ReiserFS partitioned NAS disks (not physically damaged) so you have nothing to lose. Again - if Tom and Joe L. suggestions do not work. Thanks for the tip! I will try this if we really do hit a roadblock here, and will have to reformat anyway. Link to comment
Treytor Posted January 21, 2011 Author Share Posted January 21, 2011 At this point, I would like to ask you to perform a memory test... Just in case it is the root cause. I did an extensive memory test before even booting unraid after the intial build. Everything checks out. OK, but memory has been known to go bad... and I did not want to overlook anything. Joe L. I'll check again. If I have 4 gigs, will it all get scanned with memtest 4.0? Or will I have to remove a stick, scan, then swap and scan? EDIT: Looks like we should update memtest on the unraid stick to 4.10? Right at the top of the changelog it lists support for intel i3, which is what this new build has (as directed by the unraid build wiki). Link to comment
limetech Posted January 21, 2011 Share Posted January 21, 2011 You need to Stop the array first, then the mount command I showed you before will work. You said you upgraded to 5.0-beta2 ... from what previous version? Did you try going back to that version to see if issue still there? Something that can cause this behavior, where you can mount the partition, but the 'md' device does not mount, is if the partition table is messed up. After Stopping array, type this command and post it's output: fdisk -lu /dev/sdp Link to comment
Treytor Posted January 21, 2011 Author Share Posted January 21, 2011 Oh, my mistake. I upgraded from 4.6 final, and I remember it doing something immediately after upgrading. I'm sorry but I don't remember what it was, as I really didn't look thoroughly, but I just assumed it was updating something with the array for the new version. Because of this I didn't think to try going back to 4.6 for fear it it making things even worse. Linux 2.6.32.9-unRAID. root@Cooper:~# mkdir /x root@Cooper:~# mount /dev/sdp1 /x root@Cooper:~# ls /x Backup/ root@Cooper:~# umount /dev/sdp1 /x umount: /dev/sdp1: not mounted umount: /dev/sdp1: not mounted root@Cooper:~# mount /dev/sdg1 /x root@Cooper:~# ls /x Backup/ Movies/ Software/ TV/ root@Cooper:~# umount /dev/sdp1 /x umount: /dev/sdp1: not mounted root@Cooper:~# fdisk -lu /dev/sdp Disk /dev/sdp: 1500.3 GB, 1500301910016 bytes 1 heads, 63 sectors/track, 46512336 cylinders, total 2930277168 sectors Units = sectors of 1 * 512 = 512 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdp1 63 2930277167 1465138552+ 83 Linux Partition 1 does not end on cylinder boundary. root@Cooper:~# fdisk -lu /dev/sdg Disk /dev/sdg: 500.1 GB, 500107862016 bytes 1 heads, 63 sectors/track, 15504336 cylinders, total 976773168 sectors Units = sectors of 1 * 512 = 512 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdg1 63 976773167 488386552+ 83 Linux Partition 1 does not end on cylinder boundary. Interesting... Did another memtest as well. Everything looks good. Link to comment
prostuff1 Posted January 21, 2011 Share Posted January 21, 2011 Did another memtest as well. Everything looks good. Memtest will have to be run for longer than an hour. At least run it overnight to make sure everything is good to go. On my customers builds I run Memtest for 24 hours generally (if not longer). Link to comment
Treytor Posted January 21, 2011 Author Share Posted January 21, 2011 Did another memtest as well. Everything looks good. Memtest will have to be run for longer than an hour. At least run it overnight to make sure everything is good to go. On my customers builds I run Memtest for 24 hours generally (if not longer). The first time I ran it on first build (2 days ago), it was run overnight. Everything was fine. Running again right now with 4.10. Link to comment
Joe L. Posted January 21, 2011 Share Posted January 21, 2011 Looks like the disks can be mounted outside of the array, and the partitions, as reported by fdisk -lu, look quite normal to me. Perhaps Tom will have the clues he needs to make the next suggestion. Joe L. Link to comment
lionelhutz Posted January 21, 2011 Share Posted January 21, 2011 Maybe try physically disconnecting one of the 2 drives and see what happens. If nothing good, then swap which one is disconnected. I don't see any harm in trying 4.6 again since the upgrade seems to be when the issues started. Peter Link to comment
Treytor Posted January 21, 2011 Author Share Posted January 21, 2011 Even if it may have finished? The problem was there before upgrading, by the way. I know, upgrading probably wasn't a good idea with this issue still at large, but here we are Link to comment
Joe L. Posted January 21, 2011 Share Posted January 21, 2011 Even if it may have finished? The problem was there before upgrading, by the way. I know, upgrading probably wasn't a good idea with this issue still at large, but here we are I would not bother downgrading then... since the issue was there before the upgrade. Link to comment
limetech Posted January 21, 2011 Share Posted January 21, 2011 I recommend you Stop array, then from console/telnet type this: initconfig answer Yes to the 'are you sure' prompt. Now go back to webGui, click Refresh - all your drives should have a blue dot. Now click Start, all data drives should mount and parity sync start. If this is not the case, don't do anything else, just report back what happens. Link to comment
Treytor Posted January 21, 2011 Author Share Posted January 21, 2011 Well, cool! All drives are green, parity is orange, and it's doing the parity check. Should I post a syslog? Link to comment
Treytor Posted January 21, 2011 Author Share Posted January 21, 2011 Never mind, this is going to take a while. Once it's done I'll reboot and post a syslog. Thanks again for all your help! Link to comment
Treytor Posted January 21, 2011 Author Share Posted January 21, 2011 If one of the drives (one of the ones causing a problem) has errors during the parity check, does that mean the drive is physically bad, or that the data on the drive may be corrupted? Link to comment
Joe L. Posted January 21, 2011 Share Posted January 21, 2011 If one of the drives (one of the ones causing a problem) has errors during the parity check, does that mean the drive is physically bad, or that the data on the drive may be corrupted? Don't forget that one of your drives had a ton of sectors pending re-allocation. Those are still un-readable. Basically you've lost the data in them now, as you are over-writing parity with (probably the zeros) whatever returns when a read fails. That drive MUST be replaced. You'll just have to regroup with whatever is lost. It might be files, it might be empty space on the drive, no way to know. One thing for sure... replace that drive. unRAID will rebuild what it can, but some sectors are gone forever. It is why I was trying every way I knew not to overwrite parity. Joe L. Link to comment
Treytor Posted January 21, 2011 Author Share Posted January 21, 2011 Yeah, I looked at that drive and decided it had nothing important on it anyway, so that is fine. I'm remembering now that upgrading to unraid 5 did something with the permissions. I formatted the one problematic drive that had no important data on it, but the other has data on it that I need. I can't access it as it's saying I don't have permissions. How do I force unraid to apply the updated permissions to this drive? Link to comment
Treytor Posted January 21, 2011 Author Share Posted January 21, 2011 I guess just re-running the utility under utils will do the trick. Link to comment
limetech Posted January 21, 2011 Share Posted January 21, 2011 If one of the drives (one of the ones causing a problem) has errors during the parity check, does that mean the drive is physically bad, or that the data on the drive may be corrupted? Don't forget that one of your drives had a ton of sectors pending re-allocation. Those are still un-readable. Basically you've lost the data in them now, as you are over-writing parity with (probably the zeros) whatever returns when a read fails. Not sure that's true. I didn't see any actual read errors reported by the driver in any of the syslogs he attached. In my experience you have take the SMART counters with a "grain of salt". Historically they have been notoriously buggy and "loose" in interpretation. For example Current_Pending_Sector may not necessarily mean the sector is unreadable - it might just mean the sector is taking an inordinate number of retries and/or requiring ECC correction and therefore to be safe it's scheduled for reallocation - all depends on how the firmware engineer interpreted/translated the spec., and maybe on what they need to do to get the drive to be shippable. Link to comment
Treytor Posted January 22, 2011 Author Share Posted January 22, 2011 I had the same thoughts, as I'm sure it was just a result of me interrupting some process. I have wiped and re-formatted the drive and re-added it to the array. After the parity check finishes, I'll keep an eye on it as I write to the array and see if anything comes up. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.