Need help with problem EARS drive


Recommended Posts

I apologize in advance, this has been dumped on me and I don't have a lot of time to research it, so sorry if it is readily available. I helped my brother build an unRaid a few years ago and he has been growing it steadily. It has started giving him some major problems, telling him he can't write to it, etc. I was looking through the logs and found a lot of the following:

 

Jan  5 17:27:14 Tower kernel: REISERFS error (device md11): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [17760 22980 0x0 SD]
Jan  5 17:27:14 Tower kernel: REISERFS warning: reiserfs-5089 is_internal: free space seems wrong: level=2, nr_items=131, free_space=1048 rdkey 
Jan  5 17:27:14 Tower kernel: REISERFS error (device md11): vs-5150 search_by_key: invalid format found in block 91876771. Fsck?
Jan  5 17:27:14 Tower kernel: REISERFS error (device md11): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [17760 22981 0x0 SD]
Jan  5 17:27:14 Tower kernel: REISERFS warning: reiserfs-5089 is_internal: free space seems wrong: level=2, nr_items=131, free_space=1048 rdkey 
Jan  5 17:27:14 Tower kernel: REISERFS error (device md11): vs-5150 search_by_key: invalid format found in block 91876771. Fsck?
Jan  5 17:27:14 Tower kernel: REISERFS error (device md11): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [17760 22982 0x0 SD]
Jan  5 17:27:14 Tower kernel: REISERFS warning: reiserfs-5089 is_internal: free space seems wrong: level=2, nr_items=131, free_space=1048 rdkey 
Jan  5 17:27:14 Tower kernel: REISERFS error (device md11): vs-5150 search_by_key: invalid format found in block 91876771. Fsck?
Jan  5 17:27:14 Tower kernel: REISERFS error (device md11): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [17760 22983 0x0 SD]
Jan  5 17:27:14 Tower kernel: REISERFS warning: reiserfs-5089 is_internal: free space seems wrong: level=2, nr_items=131, free_space=1048 rdkey 
Jan  5 17:27:14 Tower kernel: REISERFS error (device md11): vs-5150 search_by_key: invalid format found in block 91876771. Fsck?

 

I took a look at md11 and it is a drive he recently added and it is a WD EARS20. I know about the need to jumper the drive (he is running 4.5.6), but he didn't. So now that the drive is in the system and has 1.1TB of data on it, what can be done? Is the drive not being jumpered causing these errors? I've seen some posts that indicate that maybe the only issue with the drive not being jumpered is it would take a small performance hit. So maybe we can just run a reiserfsck on it and be done and accept the small performance hit?

 

Would upgrading to 4.7 help at all? It seems like the setting for advanced format would change the starting sector for all drives, so I don't know that we can apply that to an existing build with 12 drives in it.

 

A couple additional issues, possibly because of these errors, it is operating very slowly. I watched remotely as he tried to pull some files off, and it is copying files to his desktop at approximately 1MB/s, so it would take FOREVER to pull 1.1TB of data off this drive to clear it off, jumper it, and start over. Additionally, he doesn't have any empty slots on the tower to put in a new drive and copy the files faster internally. The rest of the array is full so there is no free space to move stuff around internally. I'm kind of at a loss on what to do for that. Maybe just pull the EARS drive, jumper it, and reinsert it and let it rebuild? He is a professional photographer and since this is his newest drive, it has important new photos on it that he really can't risk losing during an unprotected rebuild. I believe he has many of the files backed up elsewhere, but it would be a major PITA for something to go wrong.

 

Any advice is appreciated. I guess I should have told him about the EARS drives, but it didn't occur to me.

Link to comment

The first step is to run reiserfsck on /dev/md11. Then run a parity check. I would update to 4.7 and then run another parity check. Under 4.7 you can clear the MBR on the unjumpered disk and then allow the data to be rebuilt on the drive using the correct alignment.

 

This command will clear the MBR:

dd if=/dev/zero count=8 of=/dev/sdX

(where sdX = the device of the disk being "converted")

Link to comment

The jumper setting has absolutely nothing to do with those errors.  They are an indication of file-system corruption.

 

DO NOT CHANGE THE JUMPER AT THIS TIME.  (can't stress that enough)

 

Step 1.  Copy off any critical files to a different physical disk.  (another disk in the array is fine)

Step 2.  Follow the steps as described in the wiki to check and repair the file-system on /deb/md11.

 

I would not complicate the process by updating the 4.7 just yet.  And do not clear the drive unless ALL files have been copied elsewhere. (otherwise, they would be gone)

 

The wiki section is here: http://lime-technology.com/wiki/index.php?title=Check_Disk_Filesystems

 

Joe L.

 

Link to comment

Since he doesn't have any room elsewhere in his array to backup the files before repairing the file system and the external copy speeds are currently crippled, I was thinking of having him order a new 2TB drive and rebuild onto that. Then if something goes wrong during the rebuild, he'll at least be able to access the data on the existing drive. I just want to make sure we can rebuild properly without doing the reiserfsck first, or is that going to just screw up the file system on the new drive?

 

Thanks much!

Link to comment

I wanted to give an update. He ordered a new Hitachi 2TB drives and rebuilt onto that. Everything seemed ok last night, then I got another frantic call where it wouldn't let him write to the drive again. I checked the syslog and it looks like there is more file system corruption on (the new) disk11 which dropped it to readonly again. Seems like a bad connector to me. I had him shut down and pull and reattach all the sata cables going into that icydock (hard to tell which one goes to which drive) and blow out any dust on the slot. He said it did seem like one of the cables seemed loose, so here's hoping. I'll repair the file system on it today.

Link to comment

Did you run reiserfsck before you replaced the drive? If not, the corrupt filesystem was rebuilt on the new drive and you still need to run reiserfsck. Parity works at the bit level so the corrupted file system is reflected in parity. This is not an indication of a failing drive. Sometimes file systems become corrupt.

Link to comment

That was one thing I wasn't sure of, if it would rebuild with the bad file system or not. I didn't want to do the reiserfsck on a potentially bad disk and have everything go to hell, so we rebuilt on the new disk without that step. So it sounds like encountering the bad file system again might not have been due to a bad or loose cable then, it was just built that way. I ran reiserfsck on the new disk, it had minimal changes, and everything appears to be stable now. Thanks for all advice given along the way.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.