Jump to content
We're Hiring! Full Stack Developer ×

[SOLVED] 5.0B12, Array Issues, Troubles getting syslog


vinny.r

Recommended Posts

I've been stable on 5.0B12 since it was released.  No problems.   Using 3 x 2TB EARS drives on an MLV8 card.

 

Suddenly yesterday the server was down.  Could not telnet in, could not get to web interface.  Plugged in a monitor and got a blank screen.   So I prayed, and reboot.

 

On a reboot I can talk to the server again.  On the monitor I see the typical boot stages, no problems there.

 

The web interface says, starting...  Then it starts a Parity check.  Fair enough.   If I click 'refresh', the server locks up.  No more telnet, no more web interface, no more SMB.   Strange.... I left it like this overnight, thinking it just had to finish parity check.  No luck, in the morning, I was back to zero access.

 

So I tried to disable all the plugins in the go script, by putting a # in front of them. (Python, sab, sickbeard, simplemenu, etc)

 

reboot, and again, the same procedure.  So now I just want a syslog.  I type in the command to copy it to the flash.  As soon as I hit enter, it locks up again.  No telnet, no web interface, no smb.   I look on the flash file, no syslog.

 

So I've tried rebooting about 4 times now.  Each time, the server comes up, I can telnet and poke around abit.  But as soon as I try to enter in commands, or open a file, or refresh the web interface status, the server locks up.

 

What's my next step?  Am I doing anything wrong in my troubleshooting method?  How can I get a useful syslog?

Link to comment

ok I managed to copy this from the tail of the syslog:

 

 

Sep 24 10:00:40 sayulitaserver kernel:          res 41/40:00:38:5b:83/00:00:02:00:00/40 Emask 0x409 (media error) <F>

Sep 24 10:00:40 sayulitaserver kernel: ata8.00: status: { DRDY ERR }

Sep 24 10:00:40 sayulitaserver kernel: ata8.00: error: { UNC }

Sep 24 10:00:40 sayulitaserver kernel: ata8.00: configured for UDMA/133

Sep 24 10:00:40 sayulitaserver kernel: ata8: EH complete

Sep 24 10:00:40 sayulitaserver kernel: ata9: sas eh calling libata port error handler

Sep 24 10:00:40 sayulitaserver kernel: sas: --- Exit sas_scsi_recover_host

Sep 24 10:01:11 sayulitaserver kernel: sas: command 0xf1afee40, task 0xf1942a00, timed out: BLK_EH_NOT_HANDLED

Sep 24 10:20:08 sayulitaserver in.telnetd[2535]: connect from 192.168.35.125 (192.168.35.125)

Sep 24 10:20:15 sayulitaserver login[2536]: ROOT LOGIN  on '/dev/pts/0' from '192.168.35.125'

 

Link to comment

ok I managed to copy this from the tail of the syslog:

 

 

Sep 24 10:00:40 sayulitaserver kernel:          res 41/40:00:38:5b:83/00:00:02:00:00/40 Emask 0x409 (media error) <F>

Sep 24 10:00:40 sayulitaserver kernel: ata8.00: status: { DRDY ERR }

Sep 24 10:00:40 sayulitaserver kernel: ata8.00: error: { UNC }

Sep 24 10:00:40 sayulitaserver kernel: ata8.00: configured for UDMA/133

Sep 24 10:00:40 sayulitaserver kernel: ata8: EH complete

Sep 24 10:00:40 sayulitaserver kernel: ata9: sas eh calling libata port error handler

Sep 24 10:00:40 sayulitaserver kernel: sas: --- Exit sas_scsi_recover_host

Sep 24 10:01:11 sayulitaserver kernel: sas: command 0xf1afee40, task 0xf1942a00, timed out: BLK_EH_NOT_HANDLED

Sep 24 10:20:08 sayulitaserver in.telnetd[2535]: connect from 192.168.35.125 (192.168.35.125)

Sep 24 10:20:15 sayulitaserver login[2536]: ROOT LOGIN  on '/dev/pts/0' from '192.168.35.125'

 

That is an un-readable sector on one of your drives.
Link to comment

ok.  For the moments that I can get onto the web interface, all drives have a green light next to them.  Is that still expected even with an unreadable sector?  And the valid SMART tests?

 

What is my next step then?

create a post in the general support forum.  Your issues have nothing to do with the 5.0beta.

 

Start with the wiki.  There is a section just for starting up an array.  You have a lot of reading to do.

 

Joe L.

Link to comment

I was under the impression that the BLK_EH_NOT_HANDLED error with the MV8 cards was indeed a 5.0Bx issue.  I have been doing lots of reading, hours and hours in fact. 

 

I don't understand why you want me to create a new forum post, in the same area of the forum as this current post.  What will that accomplish?  Would that not be spamming the forum?

 

I have read the section about starting an array, and did not find anything in there that I didn't already know.  I realize that moderating a forum can be frustrating, but your replies are really not helpful at all, I question why you bothered typing.  If you don't have anything helpful to say, just don't say anything at all.  I know its beta software, I am not expecting support nor do I feel owed any kind of stability.  At the same time, don't treat me like a child.

 

There are dozens of posts about the error I am having, with the SAS card I am using, and this particular beta version of Unraid.  So I am wondering, is everyone with a MV8 card on 5.0B12 having this error?  Is there an issue with the linux kernel and this card's driver?  Or should I really be looking for hardware or power problems?

 

 

Link to comment

I was under the impression that the BLK_EH_NOT_HANDLED error with the MV8 cards was indeed a 5.0Bx issue.   I have been doing lots of reading, hours and hours in fact. 

 

I don't understand why you want me to create a new forum post, in the same area of the forum as this current post.  What will that accomplish?  Would that not be spamming the forum?

 

I have read the section about starting an array, and did not find anything in there that I didn't already know.  I realize that moderating a forum can be frustrating, but your replies are really not helpful at all, I question why you bothered typing.  If you don't have anything helpful to say, just don't say anything at all.  I know its beta software, I am not expecting support nor do I feel owed any kind of stability.  At the same time, don't treat me like a child.

 

There are dozens of posts about the error I am having, with the SAS card I am using, and this particular beta version of Unraid.   So I am wondering, is everyone with a MV8 card on 5.0B12 having this error?   Is there an issue with the linux kernel and this card's driver?   Or should I really be looking for hardware or power problems?

 

 

It might be a linux kernel and driver interaction issue.  Best way to test is to actually load up 4.7 on the server and see what happens then.

 

What Joe L. was referring to was probably the UNC and DRDY ERR lines are usually related to un-readable sectors

Link to comment

I was under the impression that the BLK_EH_NOT_HANDLED error with the MV8 cards was indeed a 5.0Bx issue.   I have been doing lots of reading, hours and hours in fact. 

 

I don't understand why you want me to create a new forum post, in the same area of the forum as this current post.  What will that accomplish?  Would that not be spamming the forum?

 

I have read the section about starting an array, and did not find anything in there that I didn't already know.  I realize that moderating a forum can be frustrating, but your replies are really not helpful at all, I question why you bothered typing.  If you don't have anything helpful to say, just don't say anything at all.  I know its beta software, I am not expecting support nor do I feel owed any kind of stability.  At the same time, don't treat me like a child.

 

There are dozens of posts about the error I am having, with the SAS card I am using, and this particular beta version of Unraid.   So I am wondering, is everyone with a MV8 card on 5.0B12 having this error?   Is there an issue with the linux kernel and this card's driver?   Or should I really be looking for hardware or power problems?

 

 

It might be a linux kernel and driver interaction issue.  Best way to test is to actually load up 4.7 on the server and see what happens then.

 

What Joe L. was referring to was probably the UNC and DRDY ERR lines are usually related to un-readable sectors

I was referring to only the UNC errors... those are disk related and not unRAID version specific.  The other BLK_EH_NOT_HANDLED errors are indeed due to the drivers in the current 5.0  beta version of unRAID and do belong here.  Everybody with similar hardware is getting them.

 

Sorry for not being more clear.

 

 

Link to comment

Can I build another USB drive with 4.7 and switch back and forth between them?  As long as I keep the config folder synced between the 2 of them?

 

Strange that it can be stable for so long, then suddenly have this failure, when all drives still pass Smart testing.

 

Do we know if the BLK error other people are having, is a dealbreaker for them?  Or do their servers still function?  For all I know I could have been having those errors for awhile.

 

And Joe, in case you missed it, this IS the general support forum.  How can a thread not belong here, whether it is regarding the beta or not?

Link to comment

Ok.

 

I downgraded to 4.7.   Replaced the 2 files, and renamed the super.dat file, as well as the 2 password files.  

 

System came up fine.  Re-assigned the disks as they were before, and did an initconfig.

 

Shares came up fine, all the data came back fine.  But now I need to rebuild the parity.  No big deal, right?

 

20 minutes into the rebuild, I get a kernel panic, out of memory.   Read up on that one, and seems its because there are so many errors so it is filling up the syslog.

 

Now I rebooted, cancelled the parity sync.

 

All my data now disappeared.  I stopped the array as soon as I saw the data was gone.

 

Can't help wishing I never messed with unraid!

Link to comment

Ok, I will do that. 

 

First I'm just going to try to recover the data onto another computer. 

 

Chances are my problems are hardware related, at least because of a previously-unknown incompatibility with this SAS card and the latest Unraid Betas.  Perhaps there should be a more specific warning on the beta release thread for this particular hardware.

 

From what I understand I should still be able to recover all my data onto a windows computer using a reiser file system tool.

Link to comment

Ok, I will do that. 

 

First I'm just going to try to recover the data onto another computer. 

 

Chances are my problems are hardware related, at least because of a previously-unknown incompatibility with this SAS card and the latest Unraid Betas.   Perhaps there should be a more specific warning on the beta release thread for this particular hardware.

 

From what I understand I should still be able to recover all my data onto a windows computer using a reiser file system tool.

There is a lot of discussion about such incompatibilities with SASLP and the latest beta.  If you had not read the thread then you may not have noticed it, but reading a beta release thread is something that should be done in any case.

Link to comment

Ok, I will do that.  

 

First I'm just going to try to recover the data onto another computer.  

 

Chances are my problems are hardware related, at least because of a previously-unknown incompatibility with this SAS card and the latest Unraid Betas.   Perhaps there should be a more specific warning on the beta release thread for this particular hardware.

 

From what I understand I should still be able to recover all my data onto a windows computer using a reiser file system tool.

There is a lot of discussion about such incompatibilities with SASLP and the latest beta.  If you had not read the thread then you may not have noticed it, but reading a beta release thread is something that should be done in any case.

 

As it is becomming pretty evident there are issues with these cards in the betas and a lot of people use these cards.  Then maybe its an idea for a warning to be put by the download link...

Or if thats not possible the first post in the beta version threads could be edited to contain up to date known issues with the beta.. That way people could read the first post without having to go through an entire thread to get a summary of issues.. even if its just serious issues such as these cards etc.. just an idea anyway.  It wouldnt take much to maintain it, if its just serious issues and would save users a lot of time etc...

 

Actually the second post would be better, reserve that for issues as they become known. -just an idea :)

 

Link to comment

Ok, I will do that. 

 

First I'm just going to try to recover the data onto another computer. 

 

Chances are my problems are hardware related, at least because of a previously-unknown incompatibility with this SAS card and the latest Unraid Betas.   Perhaps there should be a more specific warning on the beta release thread for this particular hardware.

 

From what I understand I should still be able to recover all my data onto a windows computer using a reiser file system tool.

There is a lot of discussion about such incompatibilities with SASLP and the latest beta.  If you had not read the thread then you may not have noticed it, but reading a beta release thread is something that should be done in any case.

 

I hear ya.  When I downloaded the beta, the thread only had a couple of pages, and things were all good!

 

I know all about all the dangers of using beta software, trust me.  I've been using mostly beta software since I was 12 years old and I'm into my thirties now.  I can't think of any software I use daily that isn't beta.  Most of it works perfectly. 

 

I've never experienced so much data loss ever.  It's pretty tough to back up 6TB of video.  Where do you back it up to?  Any kind of solution is very expensive.  The point of the server is that you don't need a backup, it IS the backup.

 

I dont' understand why every time I had a failure, I lost all my data.  Reverting to 4.7 was hopeless, and I'm not the only one this has happened to. 

 

The SAS card I was using is on the recommended hardware list, and seems quite common.  I had no problems and the server was rocking for quite a while, so I wasn't checking the forums. 

 

I don't place any blame on anyone here except myself.  But a warning system built into the thread would have come in handy.

Link to comment

The server is NOT the backup. This discussion happens all the time. It provides protection against a single disk failure. Never user beta driver software with mission critical data. A cost effective way to make a backup is to build a second unRAID server. If your data is worth it and a backup is required then unRAID is the perfect backup solution. I suggest that you use the stable release version of unRAID for a backup server for your primary unRAID server. It's safer yet to place the backup unRAID server at a remote location.

Link to comment

Ok, I will do that. 

 

First I'm just going to try to recover the data onto another computer. 

 

Chances are my problems are hardware related, at least because of a previously-unknown incompatibility with this SAS card and the latest Unraid Betas.   Perhaps there should be a more specific warning on the beta release thread for this particular hardware.

 

From what I understand I should still be able to recover all my data onto a windows computer using a reiser file system tool.

There is a lot of discussion about such incompatibilities with SASLP and the latest beta.  If you had not read the thread then you may not have noticed it, but reading a beta release thread is something that should be done in any case.

 

I hear ya.  When I downloaded the beta, the thread only had a couple of pages, and things were all good!

 

I know all about all the dangers of using beta software, trust me.  I've been using mostly beta software since I was 12 years old and I'm into my thirties now.  I can't think of any software I use daily that isn't beta.  Most of it works perfectly. 

 

I've never experienced so much data loss ever.  It's pretty tough to back up 6TB of video.  Where do you back it up to?  Any kind of solution is very expensive.  The point of the server is that you don't need a backup, it IS the backup.

 

I dont' understand why every time I had a failure, I lost all my data.  Reverting to 4.7 was hopeless, and I'm not the only one this has happened to. 

 

The SAS card I was using is on the recommended hardware list, and seems quite common.   I had no problems and the server was rocking for quite a while, so I wasn't checking the forums. 

 

I don't place any blame on anyone here except myself.   But a warning system built into the thread would have come in handy.

I highly doubt your data is gone.  If sounds like in conjunction with the beta upgrade you may have a disk that is/was acting up.  Before updating did you do a parity check?

Link to comment

You clearly have a hardware problems but the forum members are not clairvoyants and you will have to provide the complete hardware configuration as a start.

 

Then there are some inconsistencies in your posts - you are talking about 3 x 2TB WD and then losing 6TB of data - that is clearly not possible in your scenario.

 

And I really do not understand the obsession with that card - any motherboard made in the last few years will have SATA ports that are superior to the ones provided by the SASLP card, so IMHO it will be only natural to start by using the motherboard ports and once you need more to go into additional expenses for the controller and the cables.

 

Since you have only 3 hard drives you can connect them to the MB to eliminate the 5B12 issue with the SM controller and the Forum will try to guide to recover your data.

Once you boot - get the HD smart reports, post them here and if you succeed post the syslog.

Link to comment

You clearly have a hardware problems but the forum members are not clairvoyants and you will have to provide the complete hardware configuration as a start.

 

Then there are some inconsistencies in your posts - you are talking about 3 x 2TB WD and then losing 6TB of data - that is clearly not possible in your scenario.

 

And I really do not understand the obsession with that card - any motherboard made in the last few years will have SATA ports that are superior to the ones provided by the SASLP card, so IMHO it will be only natural to start by using the motherboard ports and once you need more to go into additional expenses for the controller and the cables.

 

Since you have only 3 hard drives you can connect them to the MB to eliminate the 5B12 issue with the SM controller and the Forum will try to guide to recover your data.

Once you boot - get the HD smart reports, post them here and if you succeed post the syslog.

 

I am curious how you come to the conclusion that onboard sata ports would be superior to a purpose built hba?

I would think its the other way around... perhaps not in the case of this particular card, though it looks like driver issues with the new kernels here not a hardware fault.

Link to comment

I am curious how you come to the conclusion that onboard sata ports would be superior to a purpose built hba?

I would think its the other way around... perhaps not in the case of this particular card, though it looks like driver issues with the new kernels here not a hardware fault.

9 times out of 10 SATA ports from the northbridge are going to be better then those on a purpose built HBA like the SASLP.  This is because of the simple fact that the built in SATA ports will have to be the first supported for the motherboard to work and because the likelihood of the driver sucking is lower.

 

Obviously you can see how the purpose built HBA is working/not working with the latest driver in the linux kernel.  Whether it is a hardware issue of not, the fact is that the HBA still does not work 100%

Link to comment

I am curious how you come to the conclusion that onboard sata ports would be superior to a purpose built hba?

I would think its the other way around... perhaps not in the case of this particular card, though it looks like driver issues with the new kernels here not a hardware fault.

9 times out of 10 SATA ports from the northbridge are going to be better then those on a purpose built HBA like the SASLP.  This is because of the simple fact that the built in SATA ports will have to be the first supported for the motherboard to work and because the likelihood of the driver sucking is lower.

 

Obviously you can see how the purpose built HBA is working/not working with the latest driver in the linux kernel.  Whether it is a hardware issue of not, the fact is that the HBA still does not work 100%

 

I see your point in this environment - linux and a huge range of mobos...

Link to comment

the problem is my motherboard is a piece of crap.  It's a Zotac NM10BE.  It features 6 sata ports in a mini-dtx format.  The problem with that is that the controller sucks, and you can't actually use more than one hard drive reliably on the controller.  It was hyped up to be an awesome server motherboard, but in useage it just never worked.  You could get a config to stick for a couple days, then it would crap out and start loosing disks.  So I simply disabled it, and installed the SAS card.  I should have just replaced the motherboard, but the SAS card was a more expandable option for about the same money.

 

I don't have any mission critical data because I'm not a spy and I'm not assigned any missions.  However, the data that I do have takes awhile to get.  Sure I can just get it again, but it takes a few weeks.  I am not prepared to build 2 devices to store my data.

 

I had 3 2TB Hdd's in the array.  I have another 2TB in the machine but its not part of the array, and has copies of my most valuable data. 

 

What I really need help with now is recovery.  I don't trust the hardware in the server so I just want to deal with the drives in another system.

 

All the data was there, then I rebuilt parity, it crashed.  Upon reboot, all the drives were empty.

 

All my other computers are on windows 7.  I can boot another linux os on cd or usb, if that helps.

 

I have enough hard drives to fit the data onto.  For now I'm going to follow the links in the faq and see where I get.

 

Thanks!

 

 

Link to comment

Ok I found the problem.

 

The 2 data drives are fine, the reiserfs partitions are ok, I can copy from them just fine.  The parity drive won't spin up.  It makes a nasty sound like a car that can't start.  I'm not lucky with the WD 2TB EARS drives, this is the second bad one I've had.

 

 

Link to comment

Ok I found the problem.

 

The 2 data drives are fine, the reiserfs partitions are ok, I can copy from them just fine.   The parity drive won't spin up.  It makes a nasty sound like a car that can't start.   I'm not lucky with the WD 2TB EARS drives, this is the second bad one I've had.

 

 

 

I had two WD's go bad recently within a week of each other, EARX's... but from what I have read the drives from each manufacturer are all the same.  They all use the same tech, manufacturered in the same way etc etc etc.. Some people have bad runs with seagate, some with WD, some with hitachi etc etc..  That is why we use systems like unraid I guess.

 

Yeah a drive that makes that kind of sound isn't good :)

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...