Large Disk Failure Help


tential

Recommended Posts

  • Replies 208
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

Posted Images

Oh crap, good point on the SSD front, I have SSDs I'm going to add as cache drives now that I understand unraid better and realize it can do a LOT of my server tasks.  I guess I really need that card to get the best performance out of my SSDs.

Edit: Is it ok to link?  I see a card new for $60 (The SAS2008) want to make sure I've got the right thing.  I didn't realize I couldn't get it on amazon and had to go through ebay.  Even with this though, I'll still have 3 HDDs on a separate controller card once I add in both my of my SSDs (17 Sata ports used 13 data, 2 parity, 2 SSD cache).

Edited by tential
Link to comment
24 minutes ago, johnnie.black said:

You can still connect the SSDs to the onboard ports.

 

yes. So I'll use the 2/6 of the onboard ports for the SSD

4/6 for my HDDs

8 from the LSI addon

and use the 4 port addon card again for the remaining drives.

 

Lol, it's been a lot of work but I feel like I'm almost there.

 

I thought I had to upgrade my CPU but it looks like it can handle my dockers as well which at least I don't have to change that!

I only intended to use transmission openvpn, sonarr, radarr and Jackett (maybe?  I don't know but it seems like I need this and with community apps it looks like it's simple to setup).

 

Hopefully that runs on my celeron dualcore!

Link to comment

@tential

 

First server kicking incident I remember discussed. We did have someone who had a server in their guest room and had a guest throw a blanket over their server to quiet it down. And a couple people have had floods. Never remember a fire, lightening shot, or anything like that. But eventually all things will happen. Kicking seems like it should be a relatively common scenario - although I expect most people would have not damage and not report it. My first suggestion to you is relocate the server to a less kickable location!

 

Just thought I'd mention - your lack of parity was not the reason for this problem. It is good you are adding, but just sayin' ... UnRAID servers should rely on parity only on very rare occasions. It provides that little extra level of protection on top of (what should be) very reliable drives.

 

I would think kicking a server, even pretty hard, would not damage its drives. Now if you kick a tower and it clunked over onto its side, that's another matter. Damage (not just to the drives) would be more likely.

 

But unless a drive is literally reading or writing at the time of a jolt, drives should have parked their heads. So if it received a jolt less than shipping jolt, it should be ok. But in the future, once dual parity is in place, if you kick it and up to 2 drives break, you'd be able to rebuild. Of course you are now the one least likely to kick their server in the future! 

Link to comment
2 minutes ago, SSD said:

@tential

 

First server kicking incident I remember discussed. We did have someone who had a server in their guest room and had a guest throw a blanket over their server to quiet it down. And a couple people have had floods. Never remember a fire, lightening shot, or anything like that. But eventually all things will happen. Kicking seems like it should be a relatively common scenario - although I expect most people would have not damage and not report it. My first suggestion to you is relocate the server to a less kickable location!

 

Just thought I'd mention - your lack of parity was not the reason for this problem. It is good you are adding, but just sayin' ... UnRAID servers should rely on parity only on very rare occasions. It provides that little extra level of protection on top of (what should be) very reliable drives.

 

I would think kicking a server, even pretty hard, would not damage its drives. Now if you kick a tower and it clunked over onto its side, that's another matter. Damage (not just to the drives) would be more likely.

 

But unless a drive is literally reading or writing at the time of a jolt, drives should have parked their heads. So if it received a jolt less than shipping jolt, it should be ok. But in the future, once dual parity is in place, if you kick it and up to 2 drives break, you'd be able to rebuild. Of course you are now the one least likely to kick their server in the future! 

 

 

Ya, I must have dislodged something, but when I first kicked I thought nothing of it.  Because I mean, that's not a big deal right?

It wasn't until I had issues writing/reading data that I thought it must be the case the drives were broken.

I also did think the drives were writing at the same since my server is constantly downloading and writing new files to it, and since the server was almost full it was distributing across drives right? 

 

The server was meant to be put into custom cabinet enclosure, but well, I had been too lazy and daunted to build it.  Now it's built and almost complete.  Just finishing the cooling portion so nothing overheats. 

 

I'm very happy to have come here though and learned more about unraid.  I always tell people its amazing and useful, but now i have a far better understanding.

Link to comment
1 minute ago, tential said:

since the server was almost full it was distributing across drives right?

Lots of details in this devil.

 

Each user share has settings which control how it uses the disks. Go to the settings page in the webUI for one of your user shares, and turn on help. See what it says about Allocation Method, Split Level, Minimum Free. There is also more about these settings in the wiki.

Link to comment

High water is a good compromise between balancing disk usage without making it constantly switch drives just because one has briefly got more free space than another. That is the default setting and the recommended setting unless you have some specific reason to use Fill Up.

 

The purpose of Minimum Free is to prevent a disk write from failing due to being out of space when there are other drives that have room. unRAID keeps each file completely on a single disk so if it runs out of space writing a file to one disk, the write will just fail. Also, it has no way to know how large a file will become when it begins to write the file. If a disk has more than Minimum Free, that disk can be chosen for beginning the file write, and if the file is too large, the write will fail. If a disk has less than Minimum Free remaining, a different disk will be chosen for beginning the file write.

 

You should set Minimum Free to larger than the largest single file you expect to write.

Link to comment
2 hours ago, trurl said:

High water is a good compromise between balancing disk usage without making it constantly switch drives just because one has briefly got more free space than another. That is the default setting and the recommended setting unless you have some specific reason to use Fill Up.

 

The purpose of Minimum Free is to prevent a disk write from failing due to being out of space when there are other drives that have room. unRAID keeps each file completely on a single disk so if it runs out of space writing a file to one disk, the write will just fail. Also, it has no way to know how large a file will become when it begins to write the file. If a disk has more than Minimum Free, that disk can be chosen for beginning the file write, and if the file is too large, the write will fail. If a disk has less than Minimum Free remaining, a different disk will be chosen for beginning the file write.

 

You should set Minimum Free to larger than the largest single file you expect to write.

 

Looks like I've got a bunch of updating to ouse.  I pretty much plugged and played, which has worked for me

 

https://www.ebay.com/itm/LSI-SAS2008-8I-SATA-9211-8i-6Gbps-8-Ports-HBA-PCI-E-RAID-Controller-Card/122432373851?epid=1240682924&hash=item1c8189c45b:g:5H4AAOSwhQhY5Lpi

 

That's the controller card I was going to purchase, that's the right model right? 

Link to comment

Good to know.  So it's telling me now after everything is done that my 10th drive is faulty?  T Hat's odd, because the parity check said 0 errors.

Here is my diagnostic,

what should I do?

tower-diagnostics-20180119-0354.zip

 

Smart test came back with no errors, that's good I hope!

 

Drive 10 is a drive that was in its own drive bay by itself.  I took it out and put it in the other drive bay that was almost full except for the bottom tray (hardest to reach).  I rescured the connection in case I knocked it loose while putting in the last 5 drives.  I'm hoping it was just that another loose cable since the smart test is ok, but I'll wait on help from ya'll per the wiki on this subject.

Edited by tential
Link to comment
29 minutes ago, johnnie.black said:

SMART looks fine but we'd need to see the diagnostics from when it was disable, the ones you posted are after rebooting, do you know if the parity sync completed before it got disabled?

 

Yes

It was completed with no errors

Then I copied a bunch of files over and was about to do some more but noticed it was faulty and drive 10 was missing.

 

I realize I messed up and was supposed to do the diagnostic before.  What's my next step?

Link to comment

The one that is showing the error is the newest disk, barely installed with not much data on it.

I had just installed it before all of this. 

You're saying I should replace this drive with a spare or to rebuild with the same drive? 

The cables are new, but the drive connected is the hardest one for me to reach to connect.

It's so deep in that my hand can knock against the other power connectors if I'm not careful. 

Disk 10 felt like the power cable wasn't secure all the way when I checked all the cables again before rebooting.  It's the hardest disk to reach though, so maybe that was my imagination and I had knocked it loose checking it in the first place.

 

Is it ok to rebuild the same disk or do I need to open up a spare and rebuild to the spare instead?  Do I keep the "faulty" one?  It passes a smart so I imagine it's still good right?

 

The emulated disk shows up fine.  Everything looks fine.

Edited by tential
Link to comment
You're saying I should replace this drive with a spare or to rebuild with the same drive? 

You now usually suggest rebuilding to a spare if available just in case something goes wrong with the rebuild, like problems with a different disk, because if it does it might leave the rebuilt disk worse than it was.

 

 

Link to comment
6 minutes ago, johnnie.black said:

Yes, the disk is most likely fine, but make note and if it fails again when you later add it to the server it might have issues.

I keep a local Wiki where I can keep notes about different machines, disks etc. Just so I will be able to detect any repeat offenders. It also helps when keeping track of warranties etc, and important configuration/setup steps.

Link to comment

 

Ok I'll keep that in mind.

While my reboot is running:

For the SAS card:

https://www.ebay.com/itm/LSI-9211-8i-SAS-SATA-6Gb-s-8Ports-Controller-HBA-PCI-E-RAID-Card-Fully-Function/112682165520?hash=item1a3c616510:g:YgMAAOSw6DVaLFBw#viTabs_0

 

Is that ok/right thing to order?  I understand of course that this is no guarantee that it will actually work when I get it, just want to make sure I order the right thing at the start.

Thanks.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.