multiple drive issues


Recommended Posts

Ive got a few issues going on right now that im not sure how to handle. I have 1 disk that has a green ball but is showing errors on the dashboard and when i stop the array it then says that the disk is missing. I also have 1 disk that all of a sudden has a grey triangle next to it. I have 2 spare drives ready to go but have no idea how to actually tackle this one.

 

I have attached a screenshot of the dashboard, syslog and smart data

 

This is for v6 b15

disk7-attributes.zip

disk12-attributes_1.zip

syslog.zip

Capture.PNG.3bbcf5cae9739340695613cbfb1b0ebb.PNG

Link to comment

New problem. I pulled drive 12 out and put it into another linux box to see if i could view any data on it which i couldn't. I put it back into my main unraid box and now it detects it as a new drive rather than the one that was assigned to drive 12

Link to comment

OK. I see you started this thread in the right place about this so I will quit replying to you over on the other thread. The smart attributes for drive7 you posted are empty. If you go to the Health section I told you about and click on Disk attributes, does it display the disk attributes?

 

Exactly what did you do with drive12 in the other linux box?

 

Most importantly, do you have backups?

Link to comment

It doesnt show anything for disk attributes, when i go to error log i do see this though: scsiModePageOffset: response length too short, resp_len=47 offset=50 bd_len=46

 

I didn't do anything other than boot it up in the unraid box that i use for pre-clearing.

 

Backups, no...im not sure how one could have a server like this plus 32+TB of backups

 

In the grand scheme of things, is there any possibility of rebuilding any of this data or am i pretty much just losing the 4ish TB of data between these 2?

Link to comment

I suspect file system corruption on disk12. Haven't seen much on the forum about fixing XFS in this situation. Maybe someone else will know.

 

You may have to go to the command line to get smart reports for disk7. See v6 help link in my sig.

 

I would say rebuilding is extremely unlikely since unRAID can only rebuild one disk if all other disks are healthy. Some file recovery may be possible though. We'll have to see what we can figure out. How important is the data?

 

Since there has been some changes since your initial post, another screenshot and syslog please, and disk7 smart if you can get it.

Link to comment

Would it be possible to do this:

 

create a new config that does not have drive 12 in it. In theory, drive 7 should be "functional" with the new config. Once thats up i can take the array back down and replace drive 7 with a new drive and then rebuild from parity? In that case i would only lose the 2TB that was on drive 12.

 

Also, since drive 12 is now showing up as SDP im feeling like i should run some tests on it to see if it is actually malfunctioning. If it passes those tests should i put it back in my array and assume it was some sort of corruption rather than a drive issue itself?

Link to comment

Would it be possible to do this:

 

create a new config that does not have drive 12 in it. In theory, drive 7 should be "functional" with the new config. Once thats up i can take the array back down and replace drive 7 with a new drive and then rebuild from parity? In that case i would only lose the 2TB that was on drive 12.

 

Also, since drive 12 is now showing up as SDP im feeling like i should run some tests on it to see if it is actually malfunctioning. If it passes those tests should i put it back in my array and assume it was some sort of corruption rather than a drive issue itself?

Parity included disk12 so you cannot rebuild without it. See this wiki for a better understanding of how parity works.

 

I have some ideas but I would like to think about them some more and get more information and other opinions before proceeding.

 

Can you read files on disk7?

Link to comment

They just happened. Yesterday i did my scheduled parity check and then today i have these issues

Reviewing the thread, I thought it might be useful to explore this a little more. Since you are on b15, did you have any notifications setup, and did you get any notifications about these drives?

 

Before we proceed with any approach, I would like some assurance that the other drives are healthy. Take a look at the disk attributes for the other drives and see if any of them are showing up as suspect. Post them if you are unsure.

Link to comment

I did not have notifications set up but will be having those setup going forward. I looked through the other drives today and didn't see anything that caught my eye. I do have a couple of drives that smart is saying Old Age to but thats about it.

Link to comment

One thing missing in your screenshots is the section "Array Operation". Post that section, it might give us a better idea of what unRAID is "thinking".

 

The latest screenshot says Array of 12 devices. That includes the parity disk. There is also a cache disk, and 2 Unassigned, sdo and sdp. You mentioned, and your first screenshot confirms, that sdp was Disk12. What is or was sdo?

 

Since you have 15 drives connected, I assume these are not all connected directly to the motherboard. How are the 2 problem drives connected?

 

Sorry it's taking so long to come up with a recommended way forward. I am hoping others will join in so we can weigh different options. In the meantime I am trying to get a more complete picture for people to review.

 

There are 2 basic approaches I am considering. One of these is to just forget about parity and try to repair the file systems on the drives. The other approach would be to try to repair the file systems while maintaining parity.

 

The 2nd approach is the usual way but would not do much good if we don't think parity is likely to be good. I'm not sure if it makes sense with multiple drive issues.

 

The first approach would allow us to try to deal with these drives separately from the array and each other. This would mean that we could rebuild parity against the remaining drives, and so get them back to parity protected, while we try to tackle the problem drives.

 

I will see if I can round up some other (real) experts to take a look at this thread.

Link to comment

I appreciate the assistance trurl, kind of at a loss here as well.

 

Attached is the Array operation section and the bottom of the drives listing.

 

I pulled drive 12 out, put it into another unraid box to see if i could view the data on it. That didnt work so i put it back into this machine and it showed up as SDP. I recently pulled it back out of this box and put it into a windows box to see if i could find any data to no avail. It shows up in disk management but i cant assign a letter to it so i can browse. As you can see in that screenshot SDP is no longer connected.

 

One of the drives is on the motherboard and the other is on a  SUPERMICRO AOC-SASLP-MV8

Capture.PNG.fa50cfb46c2e9e9b4e7ac79acaddd7c8.PNG

Link to comment

Even if disk12 was perfect as far as unRAID was concerned, you wouldn't be able to read it in Windows without some additional software, since Windows does not natively support XFS. Same goes for your reiserfs drives. The only drive in your screenshot that Windows can read is the flash.

Link to comment

sdo is a new 4tb drive. Drive 12 was on the supermicro card, that card lets me go sata or sas so these are all sata drives. Drive 12 is detected in windows so im going to assume it will show up in the bios on my unraid box as well. But to test that i need to shut down the machine im typing this on to remove it. I will update this afterwards

 

EDIT: Yes, i can see drive 12 in the bios on my unraid box

Link to comment

There was mention of another machine to do pre-clears.

 

At this point with two failed drives, I might try to access and verify whatever drive succeeds with the smart test.

From what I saw in prior posts, one drive is not responding correctly with SMART.

 

So putting the two drives, perhaps one at a time, and trying to get a smart report out of each one will help determine what is viable.

 

If a drive does not respond properly to a SMART request chances are it's a hard failure.

Sometimes this shows up as a drive SMART issue, but it's really the controller or cable or connector causing issue.

Thus, you need to inspect the hardware with a good strong flashlight. look for oxidization.

Did some event occur, power glitch?

 

I've seen drives report funky SMART issues before when using incompatible hardware.However I've also seen it on drives that were really failing.

 

At this point knowing what drives are good and/or bad is paramount before starting any recovery.

 

Perhaps put unRAID on a different flash, don't assign any drives and operate from the command line with smartctl commands first.

Link to comment

Disk7 is failing. Lots of reallocated sectors. Disk12 looks fine. What unRaid version ate you running in your preclear machine? If disk12 is xfs, and you are running an older unRaid version pre-xfs support, that might explain why you couldn't read it.

 

Best possible outcome is we get disk12 working and rebuild disk7.

 

Before doing anything, make a backup of the config directory on the flash of your main server.

Link to comment

Im running on v6 b15. Disk 12 was my only xfs drive as i had just added it a month or so ago and didnt realize it hadn't formatted it in xfs.  I have backed up the config directory

 

Do we need to be concerned that disk 12 is now showing up as something other than disk 12?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.