Replacing failed yet unRAID says 'upgrading parity'?

April 24, 201214 yr

Disk 2 in my array red-balled, so I shut down and replaced it with a spare 2TB HDD I had lying around.

When I booted up, it showed a blue ball next to disk 2 and says "disabled disk replaced" and "Start will bring the array on-line, start Data-Rebuild, and then expand the file system (if possible)." I click the check box saying "I'm sure I want to do this and it looks like the drives are Mounting, but instead of rebuilding the data onto the new disk, it then tells me that a "New parity disk installed." and clicking "Start will bring the array on-line and start Parity-Sync." Disk 2 is now red-balled.

My concern is that it's exactly the same parity disk I've had all along. There's been no change to cabling or anything else. I'm worried that if I click start, I'll actually lose all the data on disk 2.

If I fire up my tower with disk 2 disconnected, it says configuration valid and says disk 2 is "not installed". There's the option to click start however "Start will bring the array on-line (array will be unprotected)." When I click start, it looks like the drives are mounting, but then stops and goes back to the exact same screen.

I was just going to try replacing disk 2 with a brand new drive, and do the rebuild again, but if anyone knows what's going on or whether it would be safe to click Start on this second stage without losing any data, I'd really appreciate an explanation or some help.

Sorry for being a n00b, but I don't know how to post a syslog. I do know that I'm running version 4.5.4 though.

Thanks in advance,

Mike

PS - I've done a quick search and found 1 or 2 threads that seemed similar, but I couldn't really follow what the solution was.

http://lime-technology.com/forum/index.php?topic=15770.msg147005#msg147005

http://lime-technology.com/forum/index.php?topic=15088.0

Quote

April 24, 201214 yr

Disk 2 in my array red-balled, so I shut down and replaced it with a spare 2TB HDD I had lying around.

When I booted up, it showed a blue ball next to disk 2 and says "disabled disk replaced" and "Start will bring the array on-line, start Data-Rebuild, and then expand the file system (if possible)." I click the check box saying "I'm sure I want to do this and it looks like the drives are Mounting, but instead of rebuilding the data onto the new disk, it then tells me that a "New parity disk installed." and clicking "Start will bring the array on-line and start Parity-Sync." Disk 2 is now red-balled.

My concern is that it's exactly the same parity disk I've had all along. There's been no change to cabling or anything else. I'm worried that if I click start, I'll actually lose all the data on disk 2.

If I fire up my tower with disk 2 disconnected, it says configuration valid and says disk 2 is "not installed". There's the option to click start however "Start will bring the array on-line (array will be unprotected)." When I click start, it looks like the drives are mounting, but then stops and goes back to the exact same screen.

I was just going to try replacing disk 2 with a brand new drive, and do the rebuild again, but if anyone knows what's going on or whether it would be safe to click Start on this second stage without losing any data, I'd really appreciate an explanation or some help.

Sorry for being a n00b, but I don't know how to post a syslog. I do know that I'm running version 4.5.4 though.

Thanks in advance,

Mike

PS - I've done a quick search and found 1 or 2 threads that seemed similar, but I couldn't really follow what the solution was.

http://lime-technology.com/forum/index.php?topic=15770.msg147005#msg147005

http://lime-technology.com/forum/index.php?topic=15088.0

Do NOT proceed without getting guidance from lime-technology.

You were right to be cautious, a mistake could wipe out the parity you need to re-construct your failed drive. You most certainly do NOT want to start a parity sync.

Joe L.

Quote

April 24, 201214 yr

Agree with Joe. Do not start it with that rebuild parity showing. That option should not have appeared. The interface should have shown the disk was being rebuilt and that would have taken something like 8+ hours. Then, your array would have been working after.

A syslog taken after you had checked the "yes, I want to rebuild the failed disk" and hit the start could have really helped. In fact, even if you're dealing with Limetech you probably will have to do those steps again and capture the syslog to gather some info to help with the issue.

I suspect you might have a controller or SATA cable or power cord with issues feeding that disk or just a bad disk. However, I can't say for sure. I just suspect this because it's possible that unRAID failed to rebuild the disk and that's why it returned with the red ball beside the disk again and then gave the option to build parity. It was wrong to return with the option to build parity but still, the disk failing to rebuild could be the start of the problem.

Quote

April 25, 201214 yr

Author

Thanks guys!

I'm not sweating bullets as I fortunately backed up everything from disk 2 onto another unRAID server about 3 weeks ago and there is no critical data that will be lost if disk 2 disappears. Yes - I'm that paranoid about data loss, I actually have a backup server in case my redundancy fails.

If I can get away with starting the array and creating a new parity disk and then copying disk 2 back onto the tower from my other unRAID server, then that's fine with me. But I am worried that there might be an inherent instability in my main tower that I should be cautious of in case something like this were to happen again.

I'm trying to educate myself about how to get a syslog and think that unMenu could be my best option. But installing that is even testing me at the moment...

You guys say I should talk to Limetech. Do they check these boards? or is there another channel I should contact them on?

Thanks again!

Quote

April 25, 201214 yr

Yes, it would be best to figure out what happened before giving up. It might be a bug or hardware issue that needs to be addressed.

Syslog - http://lime-technology.com/forum/index.php?topic=9880.0

Use email to contact Limetech. I'm not sure but I think a support email is given on their web site. [email protected] maybe?

Quote

April 25, 201214 yr

Here's the contact info, if that helps:

http://lime-technology.com/company/contact

Quote

April 26, 201214 yr

Author

Thanks for the info guys.

I ran SMART tests on the original red-balled disk2, and it said everything was fine, so I pre-cleared the disk and then went to the browser management screen.

But it still says that the disk is red-balled, even though it's been successfully tested and pre-cleared?

I click start and it said all the disks were Mounting, but as soon as I clicked refresh it was back at the same screen saying "Start will bring the array on-line (array will be unprotected)."

So I'm still confuzzled.

PS - Thanks for the info on how to do a syslog.

syslog.txt

Quote

April 26, 201214 yr

I see this repeated a number of times;

Apr 26 19:41:20 Tower kernel: ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
Apr 26 19:41:20 Tower kernel: ata4.00: irq_stat 0x00020002, device error via SDB FIS
Apr 26 19:41:20 Tower kernel: ata4.00: failed command: READ FPDMA QUEUED
Apr 26 19:41:20 Tower kernel: ata4.00: cmd 60/c0:00:60:00:00/01:00:00:00:00/40 tag 0 ncq 229376 in
Apr 26 19:41:20 Tower kernel:          res 41/40:00:b8:00:00/d4:00:00:00:00/40 Emask 0x409 (media error) <F>
Apr 26 19:41:20 Tower kernel: ata4.00: status: { DRDY ERR }
Apr 26 19:41:20 Tower kernel: ata4.00: error: { UNC }
Apr 26 19:41:20 Tower kernel: ata4.00: configured for UDMA/100
Apr 26 19:41:20 Tower kernel: ata4: EH complete

Which I believe is pointing to an error getting this drive to initialize. This looks to be the drive connected to the 4th port on a 4-port add-in SATA card. I don't know if this is causing any real issues or not. It might be OK to leave alone for now.

Then, I see this;

Apr 26 19:41:20 Tower kernel: sd 3:0:0:0: [sde] Unhandled sense code
Apr 26 19:41:20 Tower kernel: sd 3:0:0:0: [sde] Result: hostbyte=0x00 driverbyte=0x08
Apr 26 19:41:20 Tower kernel: sd 3:0:0:0: [sde] Sense Key : 0x3 [current] [descriptor]
Apr 26 19:41:20 Tower kernel: Descriptor sense data with sense descriptors (in hex):
Apr 26 19:41:20 Tower kernel:         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
Apr 26 19:41:20 Tower kernel:         00 00 00 b8 
Apr 26 19:41:20 Tower kernel: sd 3:0:0:0: [sde] ASC=0x11 ASCQ=0x4
Apr 26 19:41:20 Tower kernel: sd 3:0:0:0: [sde] CDB: cdb[0]=0x28: 28 00 00 00 00 60 00 01 c0 00
Apr 26 19:41:20 Tower kernel: end_request: I/O error, dev sde, sector 184
Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 23
Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 24
Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 25
Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 26
Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 27
Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 28
Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 29
Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 30
Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 31
Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 32

It looks like something is not happy with the sde device (WDC_WD20EARS-00MVWB0_WD-WCAZA4877769).

Then, this line;

Apr 26 19:41:23 Tower emhttp: get_fstype: open /dev/sde1: No such file or directory

Meaning that the partition on sde1 is missing or can't be accessed. Something is screwed up in the MBR of this disk.

and then this line twice;

Apr 26 19:42:10 Tower kernel: md: do_run: lock_rdev error: -6

This line has been associated with a bad partition/drive. Basically, if the partition is bad then the unRAID md driver code can't associate the partition with the md device which leads to it throwing this error.

So, this leads to the quesions, which drive is sde? Hopefully, it is the replacement drive.

Quote

April 26, 201214 yr

Thanks for the info guys.

I ran SMART tests on the original red-balled disk2, and it said everything was fine, so I pre-cleared the disk and then went to the browser management screen.

But it still says that the disk is red-balled, even though it's been successfully tested and pre-cleared?

I click start and it said all the disks were Mounting, but as soon as I clicked refresh it was back at the same screen saying "Start will bring the array on-line (array will be unprotected)."

So I'm still confuzzled.

PS - Thanks for the info on how to do a syslog.

Post the SMART report.

Apr 26 19:41:20 Tower kernel: ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
Apr 26 19:41:20 Tower kernel: ata4.00: irq_stat 0x00020002, device error via SDB FIS
Apr 26 19:41:20 Tower kernel: ata4.00: failed command: READ FPDMA QUEUED
Apr 26 19:41:20 Tower kernel: ata4.00: cmd 60/c0:00:60:00:00/01:00:00:00:00/40 tag 0 ncq 229376 in
Apr 26 19:41:20 Tower kernel:          res 41/40:00:b8:00:00/d4:00:00:00:00/40 Emask 0x409 (media error) <F>
Apr 26 19:41:20 Tower kernel: ata4.00: status: { DRDY ERR }
Apr 26 19:41:20 Tower kernel: ata4.00: error: { UNC }
Apr 26 19:41:20 Tower kernel: ata4.00: configured for UDMA/100
Apr 26 19:41:20 Tower kernel: ata4: EH complete

This is a media error showing an UNCorrectable sector on the disk. The SMART report will reflect this error.

Quote

April 30, 201214 yr

Author

Hi guys,

Thanks again for the feedback. Unfortunately sde is disk 5 so I think I've got a problem

SMART report and syslog attached.

smart_D5.txt

smart_parity.txt

syslog_120430.txt

Quote

April 30, 201214 yr

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 36

Error 228 occurred at disk power-on lifetime: 6899 hours (287 days + 11 hours)

When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:

ER ST SC SN CL CH DH

-- -- -- -- -- -- --

40 51 08 bf 00 00 e0 Error: UNC 8 sectors at LBA = 0x000000bf = 191

Quote

April 30, 201214 yr

Author

Thanks mbryanr

So I'm assuming I need to replace disk5 with a new one?

Or is there some way I can repair it?

Quote

April 30, 201214 yr

Do you need the data on disk2? It appears you now have 1 failed disk and another disk failing.

Quote

May 2, 201214 yr

Author

No - I have everything backed up on another server.

I think I might take the cowards way out and just start from scratch.

Oh well. Next time gadget.

Quote

May 2, 201214 yr

Author

Just tried restarting my array....

Can't rebuild parity, can't really do anything.

If I replace both disk 2 and disk 5, should this problem be fixed?

syslog_120502.txt

Quote

May 2, 201214 yr

Yes, replace the 2 bad disks and initialize the array and rebuild parity.

Quote

Replacing failed yet unRAID says 'upgrading parity'?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)