Jump to content
markguy

Troubleshooting: Disk failure?

6 posts in this topic Last Reply

Recommended Posts

So, I've got 4.5.b6 running (although it did this with 4.5.b5 and whatever the most recent version of 4.4(.2?) was). This is a new box, with four drives sitting in it (only three being used with basic version at the moment). Boots up fine, starts a parity check and then roughly 30% of the way through that, I get a weird clicking every couple of seconds, the web UI goes away and nothing short of yanking the cord will get the box to shut down. I can still telnet in, which seems fairly odd. The clicking doesn't sound like a drive failure (although I don't know what else it could be), just a click.  Oh, memtest went 20 cycles without an error.

 

EDIT: And the drive temps, when last I saw a report on them, were all less than 35C.

 

Thanks in advance!

 

pastebin of syslog.txt... and add ~15,000 more lines of the same error codes when you get to the end of that file. Even pastebin didn't want any part of that!

 

EDIT: Whoops. Hardware: Supermicro C2SEE, Intel Celeron E1400 (BX80557E1400), 2x1GB Crucial memory (CT2KIT12872BA1067), COOLMAX CU-700B 700W PSU.

Share this post


Link to post

This sounds to me like you have a write error on a non-parity disk, and so unRAID is stopping the server to prevent any irreplacable damage from occuring. I was seeing the same thing recently. If I'm right, try to figure out which drive it is. It could just be the SATA/power cables, but fix the problem asap before this happens to you:

http://lime-technology.com/forum/index.php?topic=3785.0

 

 

Share this post


Link to post

So, I've got 4.5.b6 running (although it did this with 4.5.b5 and whatever the most recent version of 4.4(.2?) was). This is a new box, with four drives sitting in it (only three being used with basic version at the moment). Boots up fine, starts a parity check and then roughly 30% of the way through that, I get a weird clicking every couple of seconds, the web UI goes away and nothing short of yanking the cord will get the box to shut down. I can still telnet in, which seems fairly odd. The clicking doesn't sound like a drive failure (although I don't know what else it could be), just a click.  Oh, memtest went 20 cycles without an error.

 

EDIT: And the drive temps, when last I saw a report on them, were all less than 35C.

 

Thanks in advance!

 

pastebin of syslog.txt... and add ~15,000 more lines of the same error codes when you get to the end of that file. Even pastebin didn't want any part of that!

 

EDIT: Whoops. Hardware: Supermicro C2SEE, Intel Celeron E1400 (BX80557E1400), 2x1GB Crucial memory (CT2KIT12872BA1067), COOLMAX CU-700B 700W PSU.

 

Looks to me that you have a cabling problem to the drive.  I would suspect that you are losing power, but it could be either cable

 

Of course the drive could be bad as well, the only way to tell is to run a smartctl report (see troubleshooting link in my sig).

 

Change the data cable and connect the drive to a different power connector from the PSU.  Hopefully that will take care of it.  If you have a backplane in play, inspect it carefully for any lose connections.

Share this post


Link to post

I've had this before. It would work for a period of time, then start clicking. I narrowed it down to a power splitter. I ran a different power connection to the drive, and the problem went away.

Share this post


Link to post

I'm leaning toward bad sectors on the drive (sdb, Samsung HD501LJ), both because of the clicking (very bad sign!), and the media errors with UNC flag.  There are suspicious elements of the error sequences (BMDMA & 'Unhandled sense code'), but the uncorrectable media errors combined with clicking, sounds more like failing sectors on the drive.  As Brian said, get the SMART report for a more definitive answer.

 

Minor point, you probably have a jumper on the Seagate 1TB drive, check the Improving unRAID Performance, Remove SATA150 Jumper section.

 

Minor point 2, ACPI had a small issue (worked around) while starting.  You might keep an eye out for a BIOS upgrade.

Share this post


Link to post

Thanks for the help, folks. I switched out the Samsung drive, checked the cabling as well as I could and things seem to be working now (parity check finished with no errors). I have the smartctl reports just as a measure to ensure things are as stable as I can make them, but will have to put those up later. I'm out the door in a series of errands to... no doubt a futile effort... prepare for the arrival of our third kid next week.

 

Thanks again!

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.