Upgraded from v5, what's wrong with my drive?

gprime · January 5, 2016

I say upgrade, but I really just did a fresh install and chose the same discs as before and started up the array.

Here is my old array: http://i.imgur.com/2Rk8dto.png

This is my new array: http://i.imgur.com/b4Uf3ir.png

How can I tell what's wrong with disk 6? Is there a log I can checkout somewhere? It seemed to be fine before (or maybe not?)

EDIT: Found this in the syslog:

Jan  4 17:02:21 Tower kernel: sd 1:0:0:0: [sdh] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
Jan  4 17:02:21 Tower kernel: sd 1:0:0:0: [sdh] tag#0 Sense Key : 0x3 [current] [descriptor] 
Jan  4 17:02:21 Tower kernel: sd 1:0:0:0: [sdh] tag#0 ASC=0x11 ASCQ=0x4 
Jan  4 17:02:21 Tower kernel: sd 1:0:0:0: [sdh] tag#0 CDB: opcode=0x88 88 00 00 00 00 00 00 00 c1 60 00 00 00 08 00 00
Jan  4 17:02:21 Tower kernel: blk_update_request: I/O error, dev sdh, sector 49504
Jan  4 17:02:21 Tower kernel: ata7: EH complete
Jan  4 17:02:21 Tower kernel: md: disk6 read error, sector=49440
Jan  4 17:02:21 Tower kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
Jan  4 17:02:21 Tower kernel: REISERFS (device md6): replayed 708 transactions in 10 seconds
Jan  4 17:02:21 Tower kernel: blk_update_request: I/O error, dev sdh, sector 0
Jan  4 17:02:21 Tower kernel: sd 1:0:0:0: [sdh] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Jan  4 17:02:21 Tower kernel: sd 1:0:0:0: [sdh] tag#0 CDB: opcode=0x8a 8a 00 00 00 00 00 00 01 00 d0 00 00 00 08 00 00
Jan  4 17:02:21 Tower kernel: blk_update_request: I/O error, dev sdh, sector 65744

trurl · January 5, 2016

What do I do if I get a red X next to a hard disk?

gprime · January 5, 2016

What do I do if I get a red X next to a hard disk?

Thanks, I'll take a look

guyonphone · January 5, 2016

The photo shows that you have 2 write errors. unRAID guru's (Which I am not) will tell you that unRAID takes those write errors very seriously and will take the drive out of service as soon as it sees one. (My opinion differs here a bit, as my experience with unraid is that I have had drives "fail" that have no S.M.A.R.T. errors or other problems, and then just rebuilt them, and they have continued to run for a long time further)

In any case, you should post a smart report, and a log report from your server.

To obtain a S.M.A.R.T. report through the GUI

1. Click on the workds "Disk 6" You will be taken to a new page showing you details of the drive.

2. Scroll down and click on "Download" button to the right of "Download SMART report"

To obtain a S.M.A.R.T. report through command line

Follow the instructions Herehttp://lime-technology.com/wiki/index.php/Console#smartctl

To obtain a log report through GUI

1. Navigate to "Tools> System Log"

2. Click download in top right.

To obtain a log report through command line

Follow the instructions here https://lime-technology.com/wiki/index.php/Viewing_the_System_Log

Once you provide more info, we can help you out more.

Thanks

gprime · January 5, 2016

Thanks I'll definitely check that out. I actually didn't realize this before, but it seems as though my earlier array may have had the same issue, as the orb wasn't green like the others. I'm colourblind so it wasn't as obvious in v5 as it is in v6.

gprime · January 5, 2016

Attached!

I'm starting to think it could be as simple as a loose SATA cable, unless you guys can find anything else in the logs.

tower-smart-20160104-1725.zip

tower-syslog-20160104-1730.zip

guyonphone · January 5, 2016

Your smart report was unable to run, and contains no information. I don't know enough to provide you safe information on what to do next. Hopefully someone else will chime in.

trurl · January 5, 2016

Instead of posting separate SMART for a single drive, and posting a separate syslog, you should always go to Tools - Diagnostics and post the complete diagnostics zip. It includes the syslog and SMART for all drives and a lot of other things that might be useful in diagnosing problems. Who knows, you may have other drives about to give problems and that could cause problems while trying to recover from your current one.

While it is true that unRAID and the gurus take write errors very seriously, the gurus don't suggest replacing a drive unnecessarily. It is often something other than a bad drive. However, a drive that has been disabled for write errors must be rebuilt because it doesn't actually have valid data anymore. The valid data is in the parity array because the failed writes to the disabled disk were used to update parity anyway and the data that didn't get written can be recovered. Note that a failed write is not synonymous with a file not getting written. It could be a part of a file, or even worse, part of the filesystem that keeps track of files.

gprime · January 5, 2016

Thanks guys. I moved the server to another room a few months ago, I'm thinking a cable may have come loose. I have a bunch of swappable bays so I might try and rotate the drive to another spot to see if that clears up the issue. I'll also post additional logs if it doesn't help. I'm logging off now but will try this tomorrow.

trurl · January 5, 2016

Also, any writes to that disk that happened after it was disabled can be recovered too. unRAID will continue to accept writes for a disabled drive because it can recover them. And you can still read the drives data even though unRAID will not read from it until it is rebuilt. It gets the drives data from the parity array instead.

Just saw your reply while I was typing this. Fixing cables and anything else that might be wrong is important, but it will not re-enable the drive. It must be rebuilt either to a new drive or to itself. The wiki I linked will tell you everything you need so please read it.

trurl · January 5, 2016

And just in case I haven't been clear on this. Since your drive has been disabled for some time, there is probably a lot of the drive's data that is not actually on the drive. As soon as a write fails, unRAID disables the drive and never uses it again for reads or writes to that drive until it is re-enabled by rebuilding it. Any reads or writes for the drive are handled by parity calculations with all of the other disks.

JorgeB · January 5, 2016

Looking at your screens, your v5 array had disk6 disable, did you upgrade to v6 like that?

How did you upgraded? By doing a new config?

Because on your v6 screen you have parity building and also disk6 disable, so if you did a new config and tried do sync parity before replacing disk6 you could have invalidate your parity.

gprime · January 5, 2016

Looking at your screens, your v5 array had disk6 disable, did you upgrade to v6 like that?

Yes, someone had to actually point out to me that the orb wasn't green in my screenshot. Being severely colour blind, it may have been red (I think?) for some time now. The one thing I had noticed recently was that none of my drives were ever spinning down, they all seemed to be active all the time. Would this be a symptom of reading from parity because of an offline drive?

How did you upgraded? By doing a new config?

I did a new config. I have backups of my v5 setup, but chose to set everything up from scratch.

Because on your v6 screen you have parity building and also disk6 disable, so if you did a new config and tried do sync parity before replacing disk6 you could have invalidate your parity.

Ah shhhhiiii...

Feeling a little stupid about not doing anything about this before the upgrade ....

On the bright side the X's in v6 are really easy for me to differentiate.

I've moved the drive to another SATA/power connection and while it shows a little differently now, the SMART report seems to fail.

I'm attaching a diagnostics log for you guys.

I've got work to do now, but it sounds like I've got some reading to do tonight...

tower-diagnostics-20160105-0548.zip

JorgeB · January 5, 2016

Wait for some help, maybe someone has an idea how best to proceed, if for example disk6 has been disable for a month, all writes to that disk were being emulated by parity + all other disks, since you did a new parity sync I don’t think there’s a way to get that data back, it can however be possible to get all or some data from disk6 as it was when it failed the first time.

itimpi · January 5, 2016

Well the SMART report for disk 6 shows 17 pending sectors. I notice that disk 1 also has 1 pending sectors. Pending sectors are never good as they mean that there are sectors that cannot be read reliably, and this can affect whether a rebuild is going to be completely successful if another drive fails.

It is possible that the drives are actually fine and if you run a pre-clear cycle the pending sectors may be cleared. However you need to get any data off them before that can happen. From what has been said it is possible that you do not have good parity so that disk 6 cannot be rebuilt successfully? Disk 5 has not been marked as failed but when since it has a none-zero value for pending sectors once disk 6 has been put back into operation you will want to work on getting the pending sector on disk 5 cleared as well.

JorgeB · January 5, 2016

One thing you could try without nothing to lose is doing a new config again but this time trusting parity, then stop array, unassign disk6 and start array, disk6 will again be emulated, try and see if you can read any data from it, chances of success will depend on what this new parity sync changed.

gprime · January 5, 2016

One thing you could try without nothing to lose is doing a new config again but this time trusting parity, then stop array, unassign disk6 and start array, disk6 will again be emulated, try and see if you can read any data from it, chances of success will depend on what this new parity sync changed.

So it does show the disk as emulated, but it's also showing no disk content when I try to browse it. Hmmm.

JorgeB · January 5, 2016

That what I was afraid off, parity was damaged on latest sync, is it unmountable like earlier v6 screenshot?

I don’t know if you can successfully run reiserfsck on an emulated disk.

Squid · January 5, 2016

I don’t know if you can successfully run reiserfsck on an emulated disk.

Yes you can. No different than running it on a physical one

gprime · January 5, 2016

That what I was afraid off, parity was damaged on latest sync, is it unmountable like earlier v6 screenshot?

I don’t know if you can successfully run reiserfsck on an emulated disk.

Yeah it's still unmountable. However the red X is gone and I've just got that warning icon on it now. It gives me the option to rebuild, would it be worth trying that at this point?

Attached is the current state of the array through the UI.

JorgeB · January 5, 2016

Parity should be green, did you check the trust parity option?

You have to do a new config and before starting array check the option to trust parity, right next to the start button.

Then stop the array, unassign disk6 and start array again, every disk has to be green except disk6.

trurl · January 5, 2016

At least v6 will notify you in the future before you let things deteriorate like this.

gprime · January 5, 2016

Parity should be green, did you check the trust parity option?

You have to do a new config and before starting array check the option to trust parity, right next to the start button.

Then stop the array, unassign disk6 and start array again, every disk has to be green except disk6.

Thanks I installed from scratch again, and parity is happy now. Disk 6 is back to an X, though. I'm starting to think it was a legit disk failure.

EDIT: Just to be clear the red X was there even when it was still assigned to the array.

gprime · January 5, 2016

At least v6 will notify you in the future before you let things deteriorate like this.

That's great. I was even impressed by the change in icons for colourblind people, we are such a minority that no one else normally cares!

bonienl · January 5, 2016

Make sure that system notifications are enabled though!

Upgraded from v5, what's wrong with my drive?

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Archived