Jump to content

Disk write error


Recommended Posts

I ran a parity check recently and one of my disks ended up being removed from my array due to write errors, this disk had been running fine until then for read/write with one of my virtual machines, so the write error came as a surprise. The strangest part of the problem is the output from the syslog, which I haven't seen before (i've had disk errors/failures before, but never this type of output). the output just repeats over and over, never changing, except for the write errors in the parity check. i've attached a shortened and the full syslog. i ran a smart test which the drive passed, so i'm rebuilding the disk right now. the hdd is a WD green drive thats been running smooth for about 1.5 years. the unraid version on my server is 5.0rc12. any ideas on why this happened/what happened?

syslog_shortened.txt

Link to comment

you may be developing a bad spot on that disk.

Jun 23 22:18:48 ShadowOfIntent kernel: handle_stripe write error: 1080263264/4, count: 1
Jun 23 22:18:48 ShadowOfIntent kernel: md: disk4 write error
Jun 23 22:18:48 ShadowOfIntent kernel: handle_stripe write error: 1080263272/4, count: 1
Jun 23 22:18:48 ShadowOfIntent kernel: md: disk4 write error
Jun 23 22:18:48 ShadowOfIntent kernel: handle_stripe write error: 1080263280/4, count: 1
Jun 23 22:18:48 ShadowOfIntent kernel: md: disk4 write error
Jun 23 22:18:48 ShadowOfIntent kernel: handle_stripe write error: 1080263288/4, count: 1

 

Although you did a smart test, was it a long test?

smartctl -t long

 

Attach the current smart test.

I would probably bring the array to an idle state without emhttp and do a smart -t long on it, then check the smart report for status.

 

 

 

Link to comment

here's a short smart report, i started a long one however it hasn't finished yet for some reason, i'll have the long smart test done tomorrow if its still being slow

edit: just got back to my server to find that the long smart test still hadn't finished, after 8 hours of it 'running', my server is giving odd segfaults occationally when the smart command is used as well

 

 

Jun 25 15:32:36 ShadowOfIntent kernel: mdcmd (23): import 22 0,0 (unRAID engine)

Jun 25 15:32:36 ShadowOfIntent kernel: mdcmd (24): import 23 0,0 (unRAID engine)

Jun 25 15:32:36 ShadowOfIntent emhttp_event: driver_loaded (System)

Jun 25 15:32:36 ShadowOfIntent kernel: smartctl[12019]: segfault at 52f0d641 ip 080aa050 sp bfacd6a0 error 6 (Errors)

Jun 25 15:32:38 ShadowOfIntent emhttp: shcmd (14316): rmmod md-mod |$stuff$ logger (Other emhttp)

Jun 25 15:32:38 ShadowOfIntent emhttp: shcmd (14317): modprobe md-mod super=/boot/config/super.dat slots=24 |$stuff$ logger (unRAID engine)

Jun 25 15:32:38 ShadowOfIntent emhttp: shcmd (14318): udevadm settle (Other emhttp)

Jun 25 15:32:38 ShadowOfIntent kernel: md: unRAID driver removed (System)

Jun 25 15:32:38 ShadowOfIntent kernel: md: unRAID driver 2.1.5 installed (System)

smart.txt

Link to comment

This line in the smart report indicates the disk retracted the heads upon power failure 195 times.

192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      195

 

You might want to check and re-seat the power cable to the drive (or any splitters involved, or drive trays involved, or backplanes)

If there is a loose connection, or an intermittent cable, you'll find the disk failing intermittently.

 

With the array stopped, but with the drives spinning, you might gently move the cables, you should not hear any disk spin down or retract the disk heads.  (I had a splitter in my server that was improperly crimped at its connectors... nearly the same symptoms as you.  It causes "hair-loss", as you'll pull your hair our trying to figure out what is happening)

 

Joe L.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...