Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

[SOLVED] Did I Screw Up?

Featured Replies

v4.5.6...just noticed bad hdd for first time; radio button blicking red. I stopped the array and rebooted; you know, wishful thinking.  :-[

 

Now the drive says "not installed". How should I and can I recover (unscathed) from here?

 

CD

I will get you started, then others will chime in.

 

Post a copy of your system log

 

while your waiting for answer power down your server and check cables and power to the drive in question. By then someone will most likely have taken a look at your system logs and give you an idea where to proceed.

You didn't screw up. A syslog from the system after the failure might have helped but it's not the end of the world.

 

You could even replace the cables instead of just checking them. Try to get the drive to show in BIOS and then show in unRAID when you reboot. If you can't get the drive to appear then you might have simply suffered an actual drive failure. If so, you'll have to just replace it and let unRAID rebuild the data to the new drive.

 

Also, do post a syslog after you boot just in case it shows something that can help.

 

http://lime-technology.com/wiki/index.php?title=Troubleshooting#Capturing_your_syslog

 

Peter

To add to the comments above, it does appear to be a drive failure of Disk 4.  It is currently being emulated.  The system can see a drive connected there, and was able to establish a SATA link to it, but did not receive any identifying response, and that is bad.  In other words, the drive is electrically present, and can set up the SATA channel for drive communications, but is apparently completely unresponsive, is unable to communicate over that SATA link, and therefore has probably failed.  You should first, as was suggested above, play with the power and SATA cables to the drive, replace them if possible, because it is always possible a cable connection has slipped or is loose, but I'm not optimistic.

  • Author

To add to the comments above, it does appear to be a drive failure of Disk 4.  It is currently being emulated.  The system can see a drive connected there, and was able to establish a SATA link to it, but did not receive any identifying response, and that is bad.  In other words, the drive is electrically present, and can set up the SATA channel for drive communications, but is apparently completely unresponsive, is unable to communicate over that SATA link, and therefore has probably failed.  You should first, as was suggested above, play with the power and SATA cables to the drive, replace them if possible, because it is always possible a cable connection has slipped or is loose, but I'm not optimistic.

 

Thanks. As I said, I've been "lucky" enough that this is my first issue with unRAID...been running for like 2 years...so you know how it is; you panic, and forget all common sense. Like it could just be a loose connection or something.

 

Of course, what you're saying makes perfect sense Rob; that because the syslog shows it "sees" the drive, just can't communicate with it...that I should brace myself for the worst. I'll have a chance to crack the case open and look over the cabling tonight. If that doesn't "fix" it, I'll be online with NewEgg and ordering a replacement.

 

I appreciate the help so far; great community...and if I need to replace the drive, as I said, it'll be my first one...so I'll look forward to some help through that.

 

CD

  • Author

OK; stopped the array, powered down...cracked-open the case, made sure everything was tight, started back up. Now the array is stopped, but it looks like it senses the old, "bad" hdd as new. It's asking me to check the box to confirm I want to start the array, and do a data rebuild.

 

At this point, I'd actually like to take the disk out of the array, continue without it...and test it and see if it's going in and out on me. Make sense? How should I proceed?

 

CD

syslog-2011-01-17-2.txt

Here are two options:

 

You could rebuild onto a known good disk. I would recommend this option, as you minimize the time you operate without parity protection. In the ideal world you should use a disk that has been exercised using Joe's preclear script.

 

The second option is to let the system continue to emulate the failed disk and copy the data from that disk to another place (on the array or off). Following that you can remove the failed drive, perform an "initconfig" (see wiki) and rebuild parity.

  • Author

Here are two options:

 

You could rebuild onto a known good disk. I would recommend this option, as you minimize the time you operate without parity protection. In the ideal world you should use a disk that has been exercised using Joe's preclear script.

 

The second option is to let the system continue to emulate the failed disk and copy the data from that disk to another place (on the array or off). Following that you can remove the failed drive, perform an "initconfig" (see wiki) and rebuild parity.

 

So...if I opt for option 2 for now, do I check it's ok to bring the array back online...and it will keep my data intact, and take that drive out of the array?

 

CD

The check and start will make unRAID attempt to rebuild onto that drive again.

 

Have you installed unMENU? If so, I believe you could pull the SMART reports for the drive.

 

You have 2 options. Go to the devices page and set that drive slot to unassigned so starting the array will simulate the drive. Check the box and start the array and see what happens when unRAID attempts to rebuild the drive.

 

Peter

 

You are right, the syslog shows Disk 4 fully operational, and talking!  Powering off and back on must have cold booted its firmware.  I should have noticed that you had only rebooted, not power-cycled the drive.  Burtjr above did suggest power cycling the drive, and I missed that.  It does not always work, but in your case, it indicates the drive is not in as bad a shape as I originally thought.

 

The easy way to restore your array is to use the Trust My Array procedure, so long as you have not modified Disk 4 in the interim.

 

The very first thing though that you want to do is get a SMART report for the drive.  This basically asks the drive directly about its health.  It would probably have been very useful to see the syslog as soon as you discovered the drive had been disabled, and before rebooting.  That could have told us what actually happened, and whether it was the drive's fault at all.  However, the SMART report should give you a pretty good idea as to whether the drive is OK, or should be monitored carefully for awhile, or should be replaced as soon as possible.  And that should help you decide your best option, keep as is or rebuild on replacement drive or save data and remove.

  • Author

You are right, the syslog shows Disk 4 fully operational, and talking!  Powering off and back on must have cold booted its firmware.  I should have noticed that you had only rebooted, not power-cycled the drive.  Burtjr above did suggest power cycling the drive, and I missed that.  It does not always work, but in your case, it indicates the drive is not in as bad a shape as I originally thought.

 

The easy way to restore your array is to use the Trust My Array procedure, so long as you have not modified Disk 4 in the interim.

 

The very first thing though that you want to do is get a SMART report for the drive.  This basically asks the drive directly about its health.  It would probably have been very useful to see the syslog as soon as you discovered the drive had been disabled, and before rebooting.  That could have told us what actually happened, and whether it was the drive's fault at all.  However, the SMART report should give you a pretty good idea as to whether the drive is OK, or should be monitored carefully for awhile, or should be replaced as soon as possible.  And that should help you decide your best option, keep as is or rebuild on replacement drive or save data and remove.

 

Cool. I do have unMENU, and I see the SMART options under Disk Management. Let's run this sucker! Smart Status Report I assume?

 

CD

SMART.txt

I see the SMART options under Disk Management. Let's run this sucker! Smart Status Report I assume?

 

That disk is in really bad shape...  According to the SMART report, it has 2047 re-allocated sectors.

  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      2047

You are WAY overdue for a RMA on it.  Most disks only have a few thousand spare sectors. Frankly I'm surprised the normalized VALUE of 100 is not lower. 

 

Now is NOT the time to wait for a sale... That disk is in really bad shape, replace it.

 

Joe L.

  • Author

I see the SMART options under Disk Management. Let's run this sucker! Smart Status Report I assume?

 

That disk is in really bad shape...  According to the SMART report, it has 2047 re-allocated sectors.

  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       2047

You are WAY overdue for a RMA on it.  Most disks only have a few thousand spare sectors. Frankly I'm surprised the normalized VALUE of 100 is not lower. 

 

Now is NOT the time to wait for a sale... That disk is in really bad shape, replace it.

 

Joe L.

 

Okey dokes; I can order a replacement today. Can I "re-distribute" the array without it, so I get my data back online in the meantime...or is it more trouble than it's worth, and I should just hang in there for NewEgg?

 

CD

  • Author

I see the SMART options under Disk Management. Let's run this sucker! Smart Status Report I assume?

 

That disk is in really bad shape...  According to the SMART report, it has 2047 re-allocated sectors.

  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       2047

You are WAY overdue for a RMA on it.  Most disks only have a few thousand spare sectors. Frankly I'm surprised the normalized VALUE of 100 is not lower. 

 

Now is NOT the time to wait for a sale... That disk is in really bad shape, replace it.

 

Joe L.

 

Yeah, I'm not much for reading SMART reports, but I guess common sense tells you when all categories report as "pre-fail" or "old-age", it's time to give up the fight...lol. Admittedly, this is a very old drive; a little 500G number from my very first MSS, that I've just been holding on to...tacked-on to the end of the array.

 

I guess your server is no place for nostalgia?  ;)

 

CD

I see the SMART options under Disk Management. Let's run this sucker! Smart Status Report I assume?

 

That disk is in really bad shape...  According to the SMART report, it has 2047 re-allocated sectors.

  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       2047

You are WAY overdue for a RMA on it.  Most disks only have a few thousand spare sectors. Frankly I'm surprised the normalized VALUE of 100 is not lower. 

 

Now is NOT the time to wait for a sale... That disk is in really bad shape, replace it.

 

Joe L.

 

Okey dokes; I can order a replacement today. Can I "re-distribute" the array without it, so I get my data back online in the meantime...or is it more trouble than it's worth, and I should just hang in there for NewEgg?

 

CD

Is the drive still showing as "red" on your management console?

 

You can just leave the drive "disabled" and wait for newegg, the contents will be simulated by parity and the remaining data drives.  Or, if you do not trust the other disks, or have critical files, now would be a really good time to re-evaluate your backup strategy.  (time to make copies elsewhere)

 

If you have spare capacity on other disks you can copy the files off of the bad disk, then stop the array, un-assign it on the "Devices" page, then log in on the command line  and use the "initconfig" command to set a new initial configuration without the failed drive.  That command will immediately invalidate the current parity calculations and when you next "Start" the array it will calculate parity on the new disk configuration (the one without the failed drive)

 

A new disk configuration with good parity, and no failed drives will allow you the time to shop for a sale on a disk or wait for the UPS guy to deliver a disk from Newegg.  Otherwise, don't wait for a sale... spend the few dollars more ad get your data protected once more.

 

Joe L.

Yeah, I'm not much for reading SMART reports, but I guess common sense tells you when all categories report as "pre-fail" or "old-age", it's time to give up the fight...lol.

You are mis-interpreting the report.

 

The column with the values you are describing is "Parameter TYPE" 

Each parameter is categorized in type as one that may be an indication of either Old-age, or pre-failure.

 

A parameter is not failed unless its current normalized "VALUE" is equal to or below its affiliated failure THRESHOLD.

A value in the parameter "TYPE" column does not indicate the parameter has failed the SMART test. 

If a parameter has failed, it will say FAILING_NOW in the WHEN_FAILED column.

 

Joe L.

  • Author

Yeah, I'm not much for reading SMART reports, but I guess common sense tells you when all categories report as "pre-fail" or "old-age", it's time to give up the fight...lol.

You are mis-interpreting the report.

 

The column with the values you are describing is "Parameter TYPE"   

Each parameter is categorized in type as one that may be an indication of either Old-age, or pre-failure.

 

A parameter is not failed unless its current normalized "VALUE" is equal to or below its affiliated failure THRESHOLD.

A value in the parameter "TYPE" column does not indicate the parameter has failed the SMART test.   

If a parameter has failed, it will say FAILING_NOW in the WHEN_FAILED column.

 

Joe L.

 

Ah, I see; D'oh. So I guess common sense doesn't help me read this report. I see the number of errors...like how you mention the reallocate sector count is 2047; but how are you supposed to know what's "bad"? Just experience?

 

Sorry, don't mean to turn this into a "how do you read SMART reports" thread; I'm sure I can FAQ or Google that somewhere.

 

CD

  • Author

I see the SMART options under Disk Management. Let's run this sucker! Smart Status Report I assume?

 

That disk is in really bad shape...  According to the SMART report, it has 2047 re-allocated sectors.

  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       2047

You are WAY overdue for a RMA on it.  Most disks only have a few thousand spare sectors. Frankly I'm surprised the normalized VALUE of 100 is not lower. 

 

Now is NOT the time to wait for a sale... That disk is in really bad shape, replace it.

 

Joe L.

 

Okey dokes; I can order a replacement today. Can I "re-distribute" the array without it, so I get my data back online in the meantime...or is it more trouble than it's worth, and I should just hang in there for NewEgg?

 

CD

Is the drive still showing as "red" on your management console?

 

You can just leave the drive "disabled" and wait for newegg, the contents will be simulated by parity and the remaining data drives.   Or, if you do not trust the other disks, or have critical files, now would be a really good time to re-evaluate your backup strategy.  (time to make copies elsewhere)

 

If you have spare capacity on other disks you can copy the files off of the bad disk, then stop the array, un-assign it on the "Devices" page, then log in on the command line  and use the "initconfig" command to set a new initial configuration without the failed drive.  That command will immediately invalidate the current parity calculations and when you next "Start" the array it will calculate parity on the new disk configuration (the one without the failed drive)

 

A new disk configuration with good parity, and no failed drives will allow you the time to shop for a sale on a disk or wait for the UPS guy to deliver a disk from Newegg.   Otherwise, don't wait for a sale... spend the few dollars more ad get your data protected once more.

 

Joe L.

 

No Joe, it's showing blue...like it's a new drive that needs to be initialized and put in the array? It would be nice to have my data back for the next 2-3 days while I wait on the replacement, but not essential. Whatever's easiest for recovery.

 

CD

  • Author

On a side note...as I said previously, this is my first hint of trouble with the server. I went back and forth between my starter ARK 4U-500 case, and went "hog-wild" with a Norco 4220...mostly because I wanted hot-swap, and at that point, I thought I was going to rip 40T...lol.

 

I changed my ripping strategy, to the point where the 20-drive Norco was overkill. So I sold it, and went back to my ARK...but now...as trouble rears its head...I really do miss the convenience of hot-swap. I tried a 5-in-3 cage for my ARK, but it didn't fit right.

 

Any reason to believe this case...which is a smokin' deal right now (~$65 delivered)...can't be configured for 15 hot-swap drives, like the Lian Li cases Lime-Tech builds?

 

http://www.newegg.com/Product/Product.aspx?Item=N82E16811112238&cm_re=lian_li_case-_-11-112-238-_-Product

 

CD

Unassign the drive on the devices page and you can then start the array with that disk simulated.

 

Just be aware that another disk failure will cause the complete data loss from both disks while running simulated.

 

You can copy the data from the failed disk (while it is simulated) to the other disks and then do an initconfig on the command line to remove that disk permanently. This will get your parity protection back once the parity is rebuilt. That might be a good idea if you want to run the array for the next few days while waiting for a new disk.

 

You could also pull the SMART reports for all the other disks to see if you might have any other potential issues. However, often times a bad smart report is too late...

 

You need to decide how much risk you're willing to take. I personally would likely shut-down and wait for a new disk. My next reboot would be to rebuild the failed one.

 

There are generally little "shelfs" for the 5.25" drives to sit on in most cases for their 5.25" drive cages. No 5in3 devices have slots to allow for these little shelves. So, you have to bend them out of the way. Typically, a small hammer or a C-clamp will take care of them.

 

Peter

  • Author

Unassign the drive on the devices page and you can then start the array with that disk simulated.

 

Just be aware that another disk failure will cause the complete data loss from both disks while running simulated.

 

You can copy the data from the failed disk (while it is simulated) to the other disks and then do an initconfig on the command line to remove that disk permanently. This will get your parity protection back once the parity is rebuilt. That might be a good idea if you want to run the array for the next few days while waiting for a new disk.

 

You could also pull the SMART reports for all the other disks to see if you might have any other potential issues. However, often times a bad smart report is too late...

 

You need to decide how much risk you're willing to take. I personally would likely shut-down and wait for a new disk. My next reboot would be to rebuild the failed one.

 

There are generally little "shelfs" for the 5.25" drives to sit on in most cases for their 5.25" drive cages. No 5in3 devices have slots to allow for these little shelves. So, you have to bend them out of the way. Typically, a small hammer or a C-clamp will take care of them.

 

Peter

 

Thanks Peter. I know maybe I'm trying to do too much at once here, but with a case rebuild on the horizon (yes, I jumped on that Lian Li deal), I'm thinking of waiting on the case...move over my parts...and rebuild on the new box.

 

If I've got the spare room, I can rebuild without a replacement drive, right? I mean, in the new case. I might just go from 6 down to 5 for the time being, if I can? That way, I can re-build with just one new cage to start.

CD

No Joe, it's showing blue...like it's a new drive that needs to be initialized and put in the array? It would be nice to have my data back for the next 2-3 days while I wait on the replacement, but not essential. Whatever's easiest for recovery.

then you probably started the array when it was not connected, or was not responding, since it no longer knows its model/serial number.

 

If you start the array now, it will attempt to re-construct it.  You could give that a try.  Just click on "Start"

(might need to check the 'I'm sure box under it to enable it)

 

Or, you could un-assign it, since you know it is bad, and then click "Start" to start the array without it.  It will simulate the contents of it, allow you to get to your files, even those on it, and go from there.

 

Just DO NOT type initconfig to set a new disk configuration or you'll invalidate parity, and the simulation will be impossible, and the files on the un-readable disk will be lost.

 

When you install the new disk just power down, replace the old failing disk with the new, power up, and press "Start"

 

Joe L.

  • Author

No Joe, it's showing blue...like it's a new drive that needs to be initialized and put in the array? It would be nice to have my data back for the next 2-3 days while I wait on the replacement, but not essential. Whatever's easiest for recovery.

then you probably started the array when it was not connected, or was not responding, since it no longer knows its model/serial number.

 

If you start the array now, it will attempt to re-construct it.  You could give that a try.  Just click on "Start"

(might need to check the 'I'm sure box under it to enable it)

 

Or, you could un-assign it, since you know it is bad, and then click "Start" to start the array without it.  It will simulate the contents of it, allow you to get to your files, even those on it, and go from there.

 

Just DO NOT type initconfig to set a new disk configuration or you'll invalidate parity, and the simulation will be impossible, and the files on the un-readable disk will be lost.

 

When you install the new disk just power down, replace the old failing disk with the new, power up, and press "Start"

 

Joe L.

 

OK, Joe; as I've said...I'm going to come back to this thread, and re-visit when I swap cases. But...I can unassign the "bad" drive...check the box, do the re-build...and my data should be intact on the remaining 5 drives, instead of the array of 6?

 

Then I can swap cases, and just run my array from the 5 remaining HDDs, until I add a second 5-in-3 cage, and start adding more storage?

 

CD

But...I can unassign the "bad" drive...check the box, do the re-build...and my data should be intact on the remaining 5 drives, instead of the array of 6?

If you do not set a new initial configuration your data will be intact on all 6 drives, one being suimulated by reading all the others.

 

If you un-assign the failing disk, after taking any files off of it you might want, and type "initconfig" on the command line, it will be removed from your array and when you next start the array a new initial parity calculation will occur on the remaining data drives.  When it is done you'll have an array of 5 data disks instead of 6.

 

When the replacement disk arrives, you can add it as a new disk.  It will not have any data on it.

 

Joe L.

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.