Array issue - parity disabled & data drive showing errors


Recommended Posts

I was downloading some files recently from the Internet to my unRAID network share (version 5.0.-rc5, recently upgraded from 4.7 so I could use 3TB drives), and I started to get disk access errors in my download manager.  I checked the unRAID webpage and it's showing my parity drive (a 3TB Seagate that I just installed two weeks ago; I ran preclear on it before installing) is disabled and disk2 (a 1.5TB Seagate that's been in the array for a few years I'd guess) is showing quite a few errors (819).  At this point I'm unsure as to how to proceed--if I rebuild the parity, I could be syncing the errors from disk2 and quite possibly corrupting files on that disk, and I don't think I can rebuild disk2 right now since the parity is disabled.  At this point, I have a spare 1.5TB drive but not another 3TB drive laying around that I could use.  Any ideas on how to proceed, hopefully without losing the data on disk2?  Syslog and screenshot files too large to attach, so I've uploaded them here:

 

http://66.39.67.208/downloads/syslog-2012-06-30.txt

http://66.39.67.208/downloads/unRAID-2012-6-30.png

 

Thanks.

Link to comment

Unfortunately, the syslog probably grew too much and was cut, so the piece you attached is just the last piece.  It actually shows at the top of it the 2 drives being disabled, at which point you can completely ignore the errors which follow (about 99.9% of the file), and you can put little importance on the error counts showing on that UnRAID web page (which is good news of course).  I suspect that both drives are fine.  It appears the drives are closely associated, as sda and sdb (on sd 0:0:0:0 and sd 0:0:1:0, ata1.00 and ata1.01), which look like they may well be on the same drive channel.  That implies the strong possibility that they are SATA drives on a close pair of onboard ports using an IDE Emulation mode in the BIOS settings.  I don't know that for sure without seeing the earlier parts of your syslog, but I do strongly suspect it, especially since it is rather rare for 2 drives to fail at the same time.  Please check the BIOS settings on the next boot, and look for the SATA drive support, and make sure it is set to a native SATA mode, preferably AHCI if available.  This is one more reason to never use IDE Emulation, because under some circumstances, a failure on one drive will take down the entire channel, causing both drives to be disabled, even though there is nothing wrong with the other drive.  In native SATA modes (such as AHCI), each drive gets its own channel, and cannot take another drive down with it if it fails.  Without the previous part of the syslog, I cannot speculate further as to what actually happened, but I'm hopeful that on reboot, both drives should appear again.  Please provide another syslog when possible (after a reboot).

 

The only hint as to what happened is a hint about a possible power issue.  Before rebooting, it would be a good idea to check the power connections to the drives, make sure they are tight, none are loose, and any power splitters are good quality, good clean leads and tight connections.  Might as well check the SATA cables and connections too while you are there.

 

Unfortunately, UnRAID will still believe both drives are bad, and will not automatically restore and start the array.  And as you said, you can no longer either rebuild Disk 2 or rebuild the Parity drive, with 2 drives down.  So this is the perfect time for the Trust My Array procedure (may need updating for v5?), which should restore the array to almost normal.  Once the array appears to be running OK, you will need to switch to Maintenance mode and use the Check Disk File systems procedure on Disk 2.  I don't trust the errors in that syslog piece, but it would still be a good idea to make sure the file system on Disk 2 is good.

 

Personally, I believe the above is what should be done, but I believe others will feel that running the Trust My Array procedure with the subsequent parity check may endanger your data on Disk 2.  I'll let them speak their opinions about this, probably to the effect that it would be good to abort the automatic parity check as soon as possible, take whatever measures are necessary to ensure data integrity, and only then run a correcting parity check.  I'd give others a day or two to express their opinions.  (Sorry to introduce some ambiguity!)

Link to comment

Rob,

 

Thanks for the advice.  I restarted the array and checked the connections; after restarting, the parity drive is a blue ball with status PARITY NOT VALID: DISK_DSBL_NEW and "New parity disk installed" is the array status message.  I tried the "Trust Your Parity" steps (without refreshing the unRAID web menu) with the the parity check commands by Joe for version 5.0 http://lime-technology.com/forum/index.php?topic=19385.0, but I'm getting an error message on the parity check.

 

root@NAS:~# cd /

root@NAS:/# initconfig

This will rename super.dat to super.bak, effectively clearing array configuration.

The array must be in the Stopped state and it is up to you to confirm this.

Are you sure you want to proceed? (type Yes if you do):Yes

Completed

root@NAS:/# /root/mdcmd set invalidslot 99

root@NAS:/# /root/mdcmd check

/root/mdcmd: line 11: echo: write error: Invalid argument

 

The syslog is below, but I'm not sure how to proceed at this point, as I can't seem to get the parity check to work.  If everything looks okay with disk2, I suppose I could run the parity sync, but would I want to run reiserfsck on disk2 first before doing the parity sync to make sure everything is fine?

 

http://66.39.67.208/downloads/syslog-2012-07-02.txt

Link to comment

Looks pretty good, all things considering.  I was right about the IDE Emulation, so please remember to reboot and change the BIOS settings, change the SATA support to a native SATA mode, preferably AHCI.  You have a C2SEA compatible board, which I'm sure supports AHCI.

 

The parity check cannot run unless ALL of the data drives and parity drive are valid, and your parity drive is not, at present.  I'm sorry to hear that the Trust My Array procedure appears to be broken currently with the latest v5 releases.  You will have to rebuild the parity drive instead.

 

It does not matter when you check and fix the Reiser file system on Disk 2, because once parity is rebuilt, it will be correct before and after the fixes, when run in Maintenance mode.  The one reason to run it now would be that reiserfsck may run faster, make corrections quicker (if necessary) while parity is not yet valid.  Your choice.  Definitely change to AHCI first, so any possible future failures of Disk 2 won't take the parity drive down again.

 

My own suggestion:

1. Reboot and modify BIOS settings for AHCI

2. Start UnRAID, then with array stopped, unassign Parity drive

3. Click the Maintenance mode checkbox, then start the array

4. Do the Check Disk File systems on disk 2

5. Stop array, reassign Parity drive, make sure Maintenance mode box is not checked, start array and let it rebuild parity

6. Run full parity check [optional]

Link to comment

Thanks for the info guys.  So last night I made the BIOS change on a reboot, and when I brought the array up, for some reason all the disks were unassigned (is that due to the mdcmd command not sticking?  I didn't have Joe's post when I did this last night).  I had the screenshot of the drive assignments so I just assigned them exactly the way they were before.  After doing that, and I can't remember the exact verbiage, but the web interface asked me essentially if I wanted to trust the array when I brought it up, so I did that and started to perform a parity check (not sync) in maintenance mode, with the checkbox to correct errors already checked and grayed out (so it appears I couldn't change that option even if I wanted to).  I let that run overnight, and the check was complete and had 2612467 parity updates when I looked this morning.  I just started the the array normally, and everything seems to be working fine (though my syslog is below if anyone wants to alert me of any issues).  I haven't done a full parity sync (which I know takes around 18 hours or so)--is that necessary at this point after the parity check seems to have corrected any issues?

 

http://66.39.67.208/downloads/syslog-2012-07-03.txt

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.