Jump to content

read errors while rebuliding other disk


t3

Recommended Posts

Posted

the bottom line at the top: why does the rebuild process continue, if any not rebuilding disk is showing read errors, and what exactly will then be rebuilt?

 

shouldn't the process stop at the very moment, the integrity of the data is about to be compromised? as imo the only save way to handle such a situation can be to save all data from the array and create it anew (after correcting hardware problems)...

 

 

for the records - what happened: the last parity check about a week ago was ok, array online & working, one disk was failing (inaccessible). so i stopped the array, powered down, replaced the disk, booted, precleared the new disk, added it to the array, started rebuilding.

 

the rebuild started, but almost immediately i noticed a weird behavior: only one disk (not the rebuilding one!) seemded to be active (visible from the drive led), and the overview showed a * for its temp, and about 12536717262539547 (or so) writes(!) and about 200000 read errors; rebuild performace was about 6mb/s and the syslog showed masses of read errors for this disk. so i tried to cancel the rebuild, but after i clicked "cancel rebuild", the server seemed to get slower and slower, was close to get not responsible at all, the managemant website was not longer responding already; i then tried to powerdown or shutdown via telnet but if i ever could login (looong delays or fails while several tries), it get stuck at some point without powering down - and the drive led of the disk with the read errors indicated, that there were still ongoing read attempts. in the end i shut down hard via 5sec power-button-press.

 

as the only change was a "mechanical one" (replacing a disk), i thought it would be a good idea to check all crappy sata cables (= i just hate them for beeing such a "secure" connection), checked power cables, and the disk enclosures (the problematic disk is in a hot swap bay).

 

after powering up again, everything - of course except the previously replaced disk - seemed well again, and the rebuild started automatically. this time all disk access leds were lit, rebuild performance was about 100mb/s, array was online, and random probes to read data from the disk, that showed the read errors before, provided healthy data. so i let it proceed. after the night the rebuild process was at 75%, but again there were read errors on the same disk than before - only there were "just" 2045 of them, happening somwhere in between the process.

 

i again cancelled the rebuild, this time without problems, and shutdown the server. that's where i'm now. as it apparently was the case once, it may still be a little contact problem with the enclosure, maybe just triggered by drive vibrations (the exact syslog errors look like bus problems, and the smart state of the disk is ok anyway). to me it seems save to try it a few times more after changing some cabling or hot swap bays (of course), as long as i don't add new data (of course); but opinions on that are welcome ;)

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...