Jump to content

Drive shows "invalid data" and shares are read only after hardware move.


knalbone

Recommended Posts

I moved all my Drives and my USB stick to a new drive today. Upon bootup, the configuration looked fine so I started the array and all was well.  Just a few minutes later, drive 4 showed a grey exclamation mark (invalid data I think). I stopped the array and replaced disk 4 with a preleared spare.  It looks like a parity rebuild started, but I think it stopped.  The GUI shows no activity and the hourly reports simply state that one of the drives contains invalid data and the array requires attention.

 

To make matters worse, disk 2 its showing many errors. They all appear to be write errors and the disk passed a short SMART test so I'm not sure what is going on.  I feel like my "old" disk 4 is fine and wonder if it is possible to force it back into the array, trust parity and try to rebuild disk 2.

 

This is the second time I tried to open a topic.  Last time I attached my syslog and it looks like the post never appeared.  I will try to attach the syslog in a reply message.

Link to comment

I just powered down the server and tried moving the drives in question to different locations on the backplane.  No change.  :(

Anyone have any ideas?  Only shares with caching enable appear to be writeable at the moment. All the data appears to be intact, I just can't write to the array and the GUI looks like this:

 

MIii0Rq.jpg

Link to comment

Your syslog has some disk4 write errors, and then later even more write errors on disk2. Normally unRAID disables a disk when it has write errors, which may be what the triangle on disk4 is about, but I am not sure whether it will also disable another disk if it has errors after that. Maybe it disables writes to the whole array, which would seem reasonable since parity won't be able to "absorb" the unwritten data from both drives.

 

I think the smart reports are OK, but I'm not sure how to proceed from here. Is rebuilding disk4 the way forward?

 

Probably checking cables will be a good idea to try and get rid of whatever caused this, but don't shutdown yet. Wait and see what others suggest.

Link to comment

Rebuild isn't even an option apparently. I have no option for it when I stop the array. Too late to not shut down. As I mentioned earlier, I shutdown the server and moved disks 2 and 4 to different spots on the backplane, but that had no effect.

Link to comment

The first thing to do is make sure nothing continues to try to write to the array. With the write errors you have already had, the disks and parity are not going to be consistent with each other. When a drive has a write error, the parity is updated anyway, so the disk can be rebuilt with the data that failed to be written. So it doesn't seem like a New Config with trust parity is the right approach.

 

I think we need to figure out what caused the write errors and try to eliminate that before doing anything else. Then maybe we'll have to check the filesystems on disk2 and disk4 and repair if necessary, and then maybe rebuild parity.

 

Something that would probably make sense and won't make anything worse is to do a memory test. You can select that from the boot menu.

 

Have you checked all SATA and power cables and plugs at both ends? What model is your power supply?

 

Link to comment

Something about the hardware move must have screwed things up. I don't think anything has been written anywhere other than the cache drive since this all started.

I am running this as a VM on a dell poweredge c2100 running esxi 5.5 with an LSI 2008 SAS controller passed through to unRAID. I may just moved everything back to the physical box it was on. For now I am going to bed.

Link to comment

I was able to start a rebuild this morning.  I basically unmounted the shares and stopped all services that were trying to write to the array on other VMs/clients.  Rebuild is very slow (about 4MB/s), but at least it wasn't stopped almost immediately (it was previously).

 

UNRAID says the rebuild will take >10 days.  Should I just let it run?

Link to comment

I guess something is truly wrong with /dev/sdd.  I restarted the rebuild using my precleared spare and it is flying at ~140MB/s.  I will let it finish (about 9 hours).  What can I do to test /dev/sdd (bad disk) when the rebuild is finished?

Link to comment

I guess something is truly wrong with /dev/sdd.  I restarted the rebuild using my precleared spare and it is flying at ~140MB/s.  I will let it finish (about 9 hours).  What can I do to test /dev/sdd (bad disk) when the rebuild is finished?

Preclear it.
Link to comment

Thanks everyone who has read and helped on this. The rebuild finished yesterday around 7pm. I started using the array again and everything was well.  I started preclearing /dev/sdd and all of a sudden disk 2 was disabled again! AAARRG! Something really screwy is going on.

 

I'm going preclear sdd for one cycle then I will rebuild the array and hopefully all will be well. I will remove the supposedly bad disk so I can preclear it on another PC and see what happens. I think I'll order another drive or two as well and preclear them so I have them on hand.  I just can't figure out what is going on with this server.

Link to comment

I have not yet reseated the controller.  That is a bit of an ordeal as it involves taking multiple VMs down.  I agree that signs point to it being the problem though. This is a Mezzanine card specific to this model of server, so it can only fit in one spot directly on the motherboard. I will make sure to reseat it when I pull one the drive out.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...