Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Replacing failed yet unRAID says 'upgrading parity'?

Featured Replies

Disk 2 in my array red-balled, so I shut down and replaced it with a spare 2TB HDD I had lying around.

 

When I booted up, it showed a blue ball next to disk 2 and says "disabled disk replaced" and "Start will bring the array on-line, start Data-Rebuild, and then expand the file system (if possible)." I click the check box saying "I'm sure I want to do this and it looks like the drives are Mounting, but instead of rebuilding the data onto the new disk, it then tells me that a "New parity disk installed." and clicking "Start will bring the array on-line and start Parity-Sync." Disk 2 is now red-balled.

 

My concern is that it's exactly the same parity disk I've had all along. There's been no change to cabling or anything else. I'm worried that if I click start, I'll actually lose all the data on disk 2.

 

If I fire up my tower with disk 2 disconnected, it says configuration valid and says disk 2 is "not installed". There's the option to click start however "Start will bring the array on-line (array will be unprotected)." When I click start, it looks like the drives are mounting, but then stops and goes back to the exact same screen.

 

I was just going to try replacing disk 2 with a brand new drive, and do the rebuild again, but if anyone knows what's going on or whether it would be safe to click Start on this second stage without losing any data, I'd really appreciate an explanation or some help.

 

Sorry for being a n00b, but I don't know how to post a syslog. I do know that I'm running version 4.5.4 though.

 

Thanks in advance,

Mike

 

PS - I've done a quick search and found 1 or 2 threads that seemed similar, but I couldn't really follow what the solution was.

http://lime-technology.com/forum/index.php?topic=15770.msg147005#msg147005

http://lime-technology.com/forum/index.php?topic=15088.0

 

Disk 2 in my array red-balled, so I shut down and replaced it with a spare 2TB HDD I had lying around.

 

When I booted up, it showed a blue ball next to disk 2 and says "disabled disk replaced" and "Start will bring the array on-line, start Data-Rebuild, and then expand the file system (if possible)." I click the check box saying "I'm sure I want to do this and it looks like the drives are Mounting, but instead of rebuilding the data onto the new disk, it then tells me that a "New parity disk installed." and clicking "Start will bring the array on-line and start Parity-Sync." Disk 2 is now red-balled.

 

My concern is that it's exactly the same parity disk I've had all along. There's been no change to cabling or anything else. I'm worried that if I click start, I'll actually lose all the data on disk 2.

 

If I fire up my tower with disk 2 disconnected, it says configuration valid and says disk 2 is "not installed". There's the option to click start however "Start will bring the array on-line (array will be unprotected)." When I click start, it looks like the drives are mounting, but then stops and goes back to the exact same screen.

 

I was just going to try replacing disk 2 with a brand new drive, and do the rebuild again, but if anyone knows what's going on or whether it would be safe to click Start on this second stage without losing any data, I'd really appreciate an explanation or some help.

 

Sorry for being a n00b, but I don't know how to post a syslog. I do know that I'm running version 4.5.4 though.

 

Thanks in advance,

Mike

 

PS - I've done a quick search and found 1 or 2 threads that seemed similar, but I couldn't really follow what the solution was.

http://lime-technology.com/forum/index.php?topic=15770.msg147005#msg147005

http://lime-technology.com/forum/index.php?topic=15088.0

Do NOT proceed without getting guidance from lime-technology. 

 

You were right to be cautious, a mistake could wipe out the parity you need to re-construct your failed drive.  You most certainly do NOT want to start a parity sync.

 

Joe L.

Agree with Joe. Do not start it with that rebuild parity showing. That option should not have appeared. The interface should have shown the disk was being rebuilt and that would have taken something like 8+ hours. Then, your array would have been working after.

 

A syslog taken after you had checked the "yes, I want to rebuild the failed disk" and hit the start could have really helped. In fact, even if you're dealing with Limetech you probably will have to do those steps again and capture the syslog to gather some info to help with the issue.

 

I suspect you might have a controller or SATA cable or power cord with issues feeding that disk or just a bad disk. However, I can't say for sure. I just suspect this because it's possible that unRAID failed to rebuild the disk and that's why it returned with the red ball beside the disk again and then gave the option to build parity. It was wrong to return with the option to build parity but still, the disk failing to rebuild could be the start of the problem.

 

 

  • Author

Thanks guys!

 

I'm not sweating bullets as I fortunately backed up everything from disk 2 onto another unRAID server about 3 weeks ago and there is no critical data that will be lost if disk 2 disappears. Yes - I'm that paranoid about data loss, I actually have a backup server in case my redundancy fails.

 

If I can get away with starting the array and creating a new parity disk and then copying disk 2 back onto the tower from my other unRAID server, then that's fine with me. But I am worried that there might be an inherent instability in my main tower that I should be cautious of in case something like this were to happen again.

 

I'm trying to educate myself about how to get a syslog and think that unMenu could be my best option. But installing that is even testing me at the moment...

 

You guys say I should talk to Limetech. Do they check these boards? or is there another channel I should contact them on?

 

Thanks again!

  • Author

Thanks for the info guys.

 

I ran SMART tests on the original red-balled disk2, and it said everything was fine, so I pre-cleared the disk and then went to the browser management screen.

 

But it still says that the disk is red-balled, even though it's been successfully tested and pre-cleared?

 

I click start and it said all the disks were Mounting, but as soon as I clicked refresh it was back at the same screen saying "Start will bring the array on-line (array will be unprotected)."

 

So I'm still confuzzled.

 

PS - Thanks for the info on how to do a syslog.

 

syslog.txt

I see this repeated a number of times;

 

Apr 26 19:41:20 Tower kernel: ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
Apr 26 19:41:20 Tower kernel: ata4.00: irq_stat 0x00020002, device error via SDB FIS
Apr 26 19:41:20 Tower kernel: ata4.00: failed command: READ FPDMA QUEUED
Apr 26 19:41:20 Tower kernel: ata4.00: cmd 60/c0:00:60:00:00/01:00:00:00:00/40 tag 0 ncq 229376 in
Apr 26 19:41:20 Tower kernel:          res 41/40:00:b8:00:00/d4:00:00:00:00/40 Emask 0x409 (media error) <F>
Apr 26 19:41:20 Tower kernel: ata4.00: status: { DRDY ERR }
Apr 26 19:41:20 Tower kernel: ata4.00: error: { UNC }
Apr 26 19:41:20 Tower kernel: ata4.00: configured for UDMA/100
Apr 26 19:41:20 Tower kernel: ata4: EH complete

 

Which I believe is pointing to an error getting this drive to initialize. This looks to be the drive connected to the 4th port on a 4-port add-in SATA card. I don't know if this is causing any real issues or not. It might be OK to leave alone for now.

 

Then, I see this;

 

Apr 26 19:41:20 Tower kernel: sd 3:0:0:0: [sde] Unhandled sense code
Apr 26 19:41:20 Tower kernel: sd 3:0:0:0: [sde] Result: hostbyte=0x00 driverbyte=0x08
Apr 26 19:41:20 Tower kernel: sd 3:0:0:0: [sde] Sense Key : 0x3 [current] [descriptor]
Apr 26 19:41:20 Tower kernel: Descriptor sense data with sense descriptors (in hex):
Apr 26 19:41:20 Tower kernel:         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
Apr 26 19:41:20 Tower kernel:         00 00 00 b8 
Apr 26 19:41:20 Tower kernel: sd 3:0:0:0: [sde] ASC=0x11 ASCQ=0x4
Apr 26 19:41:20 Tower kernel: sd 3:0:0:0: [sde] CDB: cdb[0]=0x28: 28 00 00 00 00 60 00 01 c0 00
Apr 26 19:41:20 Tower kernel: end_request: I/O error, dev sde, sector 184
Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 23
Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 24
Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 25
Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 26
Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 27
Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 28
Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 29
Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 30
Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 31
Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 32

 

It looks like something is not happy with the sde device (WDC_WD20EARS-00MVWB0_WD-WCAZA4877769).

 

Then, this line;

 

Apr 26 19:41:23 Tower emhttp: get_fstype: open /dev/sde1: No such file or directory

 

Meaning that the partition on sde1 is missing or can't be accessed. Something is screwed up in the MBR of this disk.

 

and then this line twice;

 

Apr 26 19:42:10 Tower kernel: md: do_run: lock_rdev error: -6

 

This line has been associated with a bad partition/drive. Basically, if the partition is bad then the unRAID md driver code can't associate the partition with the md device which leads to it throwing this error.

 

So, this leads to the quesions, which drive is sde? Hopefully, it is the replacement drive.

 

 

 

 

 

 

 

Thanks for the info guys.

 

I ran SMART tests on the original red-balled disk2, and it said everything was fine, so I pre-cleared the disk and then went to the browser management screen.

 

But it still says that the disk is red-balled, even though it's been successfully tested and pre-cleared?

 

I click start and it said all the disks were Mounting, but as soon as I clicked refresh it was back at the same screen saying "Start will bring the array on-line (array will be unprotected)."

 

So I'm still confuzzled.

 

PS - Thanks for the info on how to do a syslog.

 

Post the SMART report.

 

Apr 26 19:41:20 Tower kernel: ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
Apr 26 19:41:20 Tower kernel: ata4.00: irq_stat 0x00020002, device error via SDB FIS
Apr 26 19:41:20 Tower kernel: ata4.00: failed command: READ FPDMA QUEUED
Apr 26 19:41:20 Tower kernel: ata4.00: cmd 60/c0:00:60:00:00/01:00:00:00:00/40 tag 0 ncq 229376 in
Apr 26 19:41:20 Tower kernel:          res 41/40:00:b8:00:00/d4:00:00:00:00/40 Emask 0x409 (media error) <F>
Apr 26 19:41:20 Tower kernel: ata4.00: status: { DRDY ERR }
Apr 26 19:41:20 Tower kernel: ata4.00: error: { UNC }
Apr 26 19:41:20 Tower kernel: ata4.00: configured for UDMA/100
Apr 26 19:41:20 Tower kernel: ata4: EH complete

 

This is a media error showing an UNCorrectable sector on the disk. The SMART report will reflect this error.

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -                      36

 

 

Error 228 occurred at disk power-on lifetime: 6899 hours (287 days + 11 hours)

  When the command that caused the error occurred, the device was active or idle.

 

  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 08 bf 00 00 e0  Error: UNC 8 sectors at LBA = 0x000000bf = 191

 

  • Author

Thanks mbryanr

 

So I'm assuming I need to replace disk5 with a new one?

 

Or is there some way I can repair it?

Do you need the data on disk2? It appears you now have 1 failed disk and another disk failing.

  • Author

No - I have everything backed up on another server.

 

I think I might take the cowards way out and just start from scratch.

 

Oh well. Next time gadget.

  • Author

Just tried restarting my array....

 

Can't rebuild parity, can't really do anything.

 

If I replace both disk 2 and disk 5, should this problem be fixed?

syslog_120502.txt

Yes, replace the 2 bad disks and initialize the array and rebuild parity.

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.