[SOLVED] Rebuilt 17 drive array, upgraded to 5.0b2. 2 unformatted drives?

Treytor · January 21, 2011

If nothing else work I will propose to use the "brute force" approach before reformat.

You have Norco so the disks in questions are easy to remove. Attach them (one at a time) to a Windows computer and use Yareg - http://yareg.akucom.de/

Very interesting "news" there from November 7th, 2010 - apparently some people are using it for "data-recovery" of ReiserFS partitioned NAS disks (not physically damaged) so you have nothing to lose.

Again - if Tom and Joe L. suggestions do not work.

Thanks for the tip! I will try this if we really do hit a roadblock here, and will have to reformat anyway.

Treytor · January 21, 2011

At this point, I would like to ask you to perform a memory test... Just in case it is the root cause.

I did an extensive memory test before even booting unraid after the intial build. Everything checks out.

OK, but memory has been known to go bad... and I did not want to overlook anything.

Joe L.

I'll check again. If I have 4 gigs, will it all get scanned with memtest 4.0? Or will I have to remove a stick, scan, then swap and scan?

EDIT: Looks like we should update memtest on the unraid stick to 4.10? Right at the top of the changelog it lists support for intel i3, which is what this new build has (as directed by the unraid build wiki).

limetech · January 21, 2011

You need to Stop the array first, then the mount command I showed you before will work.

You said you upgraded to 5.0-beta2 ... from what previous version? Did you try going back to that version to see if issue still there?

Something that can cause this behavior, where you can mount the partition, but the 'md' device does not mount, is if the partition table is messed up.

After Stopping array, type this command and post it's output:

fdisk -lu /dev/sdp

Treytor · January 21, 2011

Oh, my mistake. I upgraded from 4.6 final, and I remember it doing something immediately after upgrading. I'm sorry but I don't remember what it was, as I really didn't look thoroughly, but I just assumed it was updating something with the array for the new version. Because of this I didn't think to try going back to 4.6 for fear it it making things even worse.

Linux 2.6.32.9-unRAID.

root@Cooper:~# mkdir /x

root@Cooper:~# mount /dev/sdp1 /x

root@Cooper:~# ls /x

Backup/

root@Cooper:~# umount /dev/sdp1 /x

umount: /dev/sdp1: not mounted

umount: /dev/sdp1: not mounted

root@Cooper:~# mount /dev/sdg1 /x

root@Cooper:~# ls /x

Backup/ Movies/ Software/ TV/

root@Cooper:~# umount /dev/sdp1 /x

umount: /dev/sdp1: not mounted

root@Cooper:~# fdisk -lu /dev/sdp

Disk /dev/sdp: 1500.3 GB, 1500301910016 bytes

1 heads, 63 sectors/track, 46512336 cylinders, total 2930277168 sectors

Units = sectors of 1 * 512 = 512 bytes

Disk identifier: 0x00000000

Device Boot Start End Blocks Id System

/dev/sdp1 63 2930277167 1465138552+ 83 Linux

Partition 1 does not end on cylinder boundary.

root@Cooper:~# fdisk -lu /dev/sdg

Disk /dev/sdg: 500.1 GB, 500107862016 bytes

1 heads, 63 sectors/track, 15504336 cylinders, total 976773168 sectors

Units = sectors of 1 * 512 = 512 bytes

Disk identifier: 0x00000000

Device Boot Start End Blocks Id System

/dev/sdg1 63 976773167 488386552+ 83 Linux

Partition 1 does not end on cylinder boundary.

Interesting...

Did another memtest as well. Everything looks good.

prostuff1 · January 21, 2011

Did another memtest as well. Everything looks good.

Memtest will have to be run for longer than an hour. At least run it overnight to make sure everything is good to go. On my customers builds I run Memtest for 24 hours generally (if not longer).

Treytor · January 21, 2011

Did another memtest as well. Everything looks good.

Memtest will have to be run for longer than an hour. At least run it overnight to make sure everything is good to go. On my customers builds I run Memtest for 24 hours generally (if not longer).

The first time I ran it on first build (2 days ago), it was run overnight. Everything was fine. Running again right now with 4.10.

Joe L. · January 21, 2011

Looks like the disks can be mounted outside of the array, and the partitions, as reported by fdisk -lu, look quite normal to me.

Perhaps Tom will have the clues he needs to make the next suggestion.

Joe L.

lionelhutz · January 21, 2011

Maybe try physically disconnecting one of the 2 drives and see what happens. If nothing good, then swap which one is disconnected.

I don't see any harm in trying 4.6 again since the upgrade seems to be when the issues started.

Peter

Treytor · January 21, 2011

Even if it may have finished? The problem was there before upgrading, by the way. I know, upgrading probably wasn't a good idea with this issue still at large, but here we are $:-\$

Joe L. · January 21, 2011

Even if it may have finished? The problem was there before upgrading, by the way. I know, upgrading probably wasn't a good idea with this issue still at large, but here we are $:-\$

I would not bother downgrading then... since the issue was there before the upgrade.

limetech · January 21, 2011

I recommend you Stop array, then from console/telnet type this:

initconfig

answer Yes to the 'are you sure' prompt.

Now go back to webGui, click Refresh - all your drives should have a blue dot. Now click Start, all data drives should mount and parity sync start. If this is not the case, don't do anything else, just report back what happens.

Treytor · January 21, 2011

Well, cool! All drives are green, parity is orange, and it's doing the parity check. Should I post a syslog?

Treytor · January 21, 2011

Never mind, this is going to take a while. Once it's done I'll reboot and post a syslog.

Thanks again for all your help!

Treytor · January 21, 2011

If one of the drives (one of the ones causing a problem) has errors during the parity check, does that mean the drive is physically bad, or that the data on the drive may be corrupted?

Joe L. · January 21, 2011

If one of the drives (one of the ones causing a problem) has errors during the parity check, does that mean the drive is physically bad, or that the data on the drive may be corrupted?

Don't forget that one of your drives had a ton of sectors pending re-allocation. Those are still un-readable. Basically you've lost the data in them now, as you are over-writing parity with (probably the zeros) whatever returns when a read fails.

That drive MUST be replaced.

You'll just have to regroup with whatever is lost. It might be files, it might be empty space on the drive, no way to know.

One thing for sure... replace that drive. unRAID will rebuild what it can, but some sectors are gone forever. It is why I was trying every way I knew not to overwrite parity.

Joe L.

Treytor · January 21, 2011

Yeah, I looked at that drive and decided it had nothing important on it anyway, so that is fine.

I'm remembering now that upgrading to unraid 5 did something with the permissions. I formatted the one problematic drive that had no important data on it, but the other has data on it that I need.

I can't access it as it's saying I don't have permissions. How do I force unraid to apply the updated permissions to this drive?

Treytor · January 21, 2011

I guess just re-running the utility under utils will do the trick.

limetech · January 21, 2011

If one of the drives (one of the ones causing a problem) has errors during the parity check, does that mean the drive is physically bad, or that the data on the drive may be corrupted?

Don't forget that one of your drives had a ton of sectors pending re-allocation. Those are still un-readable. Basically you've lost the data in them now, as you are over-writing parity with (probably the zeros) whatever returns when a read fails.

Not sure that's true. I didn't see any actual read errors reported by the driver in any of the syslogs he attached. In my experience you have take the SMART counters with a "grain of salt". Historically they have been notoriously buggy and "loose" in interpretation. For example Current_Pending_Sector may not necessarily mean the sector is unreadable - it might just mean the sector is taking an inordinate number of retries and/or requiring ECC correction and therefore to be safe it's scheduled for reallocation - all depends on how the firmware engineer interpreted/translated the spec., and maybe on what they need to do to get the drive to be shippable.

Treytor · January 22, 2011

I had the same thoughts, as I'm sure it was just a result of me interrupting some process. I have wiped and re-formatted the drive and re-added it to the array. After the parity check finishes, I'll keep an eye on it as I write to the array and see if anything comes up.

[SOLVED] Rebuilt 17 drive array, upgraded to 5.0b2. 2 unformatted drives?

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Archived