Restarting server during rebuild process


Recommended Posts

I am going through the process of rebuilding a 2Tb drive in my unraid.  I am in need of restarting the server because some of the working drives have gone to Read/Only file systems.  Before I do this, can anyone tell me if the rebuild process will keep going from where it is, or will it restart and take another couple of days?

 

Thanks

Link to comment

You need to supply a lot more details. If some of the drives that are not being rebuilt are read-only then I'm not confident that rebuilding will work. Even though rebuilding will not write the other drives, if they are read-only that brings into question the integrity of your array.

 

How did you get to this point? Post a syslog and a screenshot.

Link to comment

These might be symptoms of a bigger problem, but I frequently lose the ability to copy to my unraid using explorer in windows.  Because of this I have opted to use FTP.  While I was cleaning up some older empty folders during the rebuild process, and noticed they were not deleting.  I then went to the commandline and tried to delete the unused folders there, and found that the file systems they were on would respond read-only when I was trying to delete the folders.

 

See attached screen shot.

 

Syslog is full of the following:

 

Jun 24 19:00:17 Server kernel: ata10.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

Jun 24 19:00:17 Server kernel: ata10.00: failed command: IDENTIFY DEVICE

Jun 24 19:00:17 Server kernel: ata10.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in

Jun 24 19:00:17 Server kernel:          res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)

Jun 24 19:00:17 Server kernel: ata10.00: status: { DRDY }

Jun 24 19:00:17 Server kernel: ata10: hard resetting link

Jun 24 19:00:17 Server kernel: ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

Jun 24 19:00:17 Server kernel: ata10.00: configured for UDMA/33

Jun 24 19:00:17 Server kernel: ata10: EH complete

 

I can attach more of the syslog if needed.

 

Thanks

Read-only.jpg.94a48c37f1c2f086dbfdb120bbb95743.jpg

Link to comment

Update.  Through the commandline I have been able to pin this down to 1 drive (Drive 3 ST3000DM001-9YN166_W1F0XFZW (sdb)).  Its producing a lot of errors from the WEBUI.  I don't think I am going to do anything as I still have another drive rebuilding and its going to be another 20 hours, but its progressing, but I have a feeling that this other drive is going to fail as soon as the rebuild is done.  I will update this thread after that has completed

Link to comment

ok Update.  The drive rebuild finished even though the UI died.  I tailed the syslog to see when the rebuild was done.  I am seeing that the previously mentioned HDD (sdb) is showing errors.  I am going to search this on the forums, but for this question, I think I can call this thread done.

Link to comment

In general I would recommend not to write to the array while a rebuild is occurring. I have never done so.

 

As far as whether it is ok to stop the array and restart and having rebuild resume, the answer is no. The rebuild would restart.

 

File system errors will not necessarily result is a bad rebuild, and if they did, nothing you can do will fix them and make it right. I would recommend that a rebuild in progress be allowed to complete unless something really bad is happening (smoke billowing out of the case for example :)). Your best chance of a good recovery is based on the disks as they are. If you did attempt some type of file system correction, it would be critical that it be performed using its array device (/dev/mdX) and not is OS device (/dev/sdX) so that parity is continuously maintained.

 

Apparent disk failures caused by cabling issues are much more common than real drive failures. And when one does fail we are opening the case and removing a drive creating an opportunity for something to get jiggled loose and adding extra risk that a drive will appear to fail during the rebuild. I highly recommend some sort of removable disk cage to reduce this risk, as well as the use of locking SATA cables. There may be a tiny risk that the cage itself will create a problem, but experience here on the forums has shown that to be extremely unlikely while the incidence of rebuild problems after opening the case is fairly common.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.