Jump to content

3 new hard drives failing during rebuild.


Treytor

Recommended Posts

I got a couple new 5TB drives, replacing a 4TB parity with one. This went without a hitch. I then swapped out an older 2TB drive with the "old" but working parity drive. While populating and expanding the file system it failed around the 2TB mark. I then tried a brand new 4TB drive. Same thing. I then tried a brand new 5TB drive, and it just happened again.

 

The port this drive is on is running off a Silicon Image 8 port controller, so replacing one of the two cables coming off the controller is not an ideal solution. I checked, cleaned, and re-seated them just to make sure. I also cleaned the hot-swap bay the drive is in just to be sure. I find it curious that the parity rebuild worked perfectly fine right before doing this.

 

I checked the Tunable (md_num_stripes), Tunable (md_write_limit), and Tunable (md_sync_window) settings, and the first two were default and the 3rd wasn't. I don't remember what it was set at before but I changed it back to default (384) and the rebuild seemed to go faster but it still failed. The only difference being the drive failed after 320 errors instead of the usual 288.

 

Is it a possibility that there's an error happening somewhere else in my array?

 

I noticed these errors in the syslog a little while after the failure occurred while trying to do something else. Note Disk 5 is NOT the one that is being rebuilt:

 

Apr 12 20:16:15 Cooper kernel: REISERFS error (device md5): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [128045 1209 0x0 SD]
Apr 12 20:18:19 Cooper shfs/user: shfs_readdir: fstatat: UFC (13) Permission denied
Apr 12 20:18:19 Cooper shfs/user: shfs_readdir: readdir_r: /mnt/disk5/TV (13) Permission denied
Apr 12 20:18:19 Cooper shfs/user: shfs_readdir: fstatat: UFC (13) Permission denied
Apr 12 20:18:19 Cooper shfs/user: shfs_readdir: readdir_r: /mnt/disk5/TV (13) Permission denied
Apr 12 20:18:19 Cooper kernel: md: disk5 read error, sector=1464509760
Apr 12 20:18:19 Cooper kernel: REISERFS error (device md5): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [128045 1209 0x0 SD]
Apr 12 20:18:19 Cooper kernel: REISERFS error (device md5): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [128045 1209 0x0 SD]

 

I'm not sure if that's related, though.

 

I missed the syslog when the error happened, so I don't have that available right now. I could try again and fetch it if necessary.

 

Any ideas? Thanks!

Link to comment

Agreed, get a full smart reports.

 

Also just for safe measure. With the 5tb Parity drive in place. and all the old drives in place. Run another parity check with error correction just to be on the safe side.

 

I would not do that. Let's see if we have any serious disk issues. A parity check with a failing drive can cause the drive to completely fail and make recovery more difficult / impossible.

Link to comment

Agreed, get a full smart reports.

 

Also just for safe measure. With the 5tb Parity drive in place. and all the old drives in place. Run another parity check with error correction just to be on the safe side.

 

I would not do that. Let's see if we have any serious disk issues. A parity check with a failing drive can cause the drive to completely fail and make recovery more difficult / impossible.

 

Good call, my bad

Link to comment

A complete syslog might allow us to concentrate on a few drives. I would still recommend at least checking SMART reports for all drives though. Since you have posted in v6 subforum, it is very easy to get SMART report from the GUI. From Main, click on drive, then Health - Disk attributes.

 

Known ATA S.M.A.R.T. attributes

 

Crap... I blew it I'm actually running 5.0.6. Sorry guys! The rebuild is still going from my latest retry (I did one more reseat and a full system reboot) and it's around 70% now, which is farther than it's ever gotten before. So it may finish this time.

 

How would I go about getting a smart report on all the drives in 5.0.6?

 

Also, could a mod move this to the v5 subforum? Thanks guys!

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...