April 24, 201214 yr Disk 2 in my array red-balled, so I shut down and replaced it with a spare 2TB HDD I had lying around. When I booted up, it showed a blue ball next to disk 2 and says "disabled disk replaced" and "Start will bring the array on-line, start Data-Rebuild, and then expand the file system (if possible)." I click the check box saying "I'm sure I want to do this and it looks like the drives are Mounting, but instead of rebuilding the data onto the new disk, it then tells me that a "New parity disk installed." and clicking "Start will bring the array on-line and start Parity-Sync." Disk 2 is now red-balled. My concern is that it's exactly the same parity disk I've had all along. There's been no change to cabling or anything else. I'm worried that if I click start, I'll actually lose all the data on disk 2. If I fire up my tower with disk 2 disconnected, it says configuration valid and says disk 2 is "not installed". There's the option to click start however "Start will bring the array on-line (array will be unprotected)." When I click start, it looks like the drives are mounting, but then stops and goes back to the exact same screen. I was just going to try replacing disk 2 with a brand new drive, and do the rebuild again, but if anyone knows what's going on or whether it would be safe to click Start on this second stage without losing any data, I'd really appreciate an explanation or some help. Sorry for being a n00b, but I don't know how to post a syslog. I do know that I'm running version 4.5.4 though. Thanks in advance, Mike PS - I've done a quick search and found 1 or 2 threads that seemed similar, but I couldn't really follow what the solution was. http://lime-technology.com/forum/index.php?topic=15770.msg147005#msg147005 http://lime-technology.com/forum/index.php?topic=15088.0
April 24, 201214 yr Disk 2 in my array red-balled, so I shut down and replaced it with a spare 2TB HDD I had lying around. When I booted up, it showed a blue ball next to disk 2 and says "disabled disk replaced" and "Start will bring the array on-line, start Data-Rebuild, and then expand the file system (if possible)." I click the check box saying "I'm sure I want to do this and it looks like the drives are Mounting, but instead of rebuilding the data onto the new disk, it then tells me that a "New parity disk installed." and clicking "Start will bring the array on-line and start Parity-Sync." Disk 2 is now red-balled. My concern is that it's exactly the same parity disk I've had all along. There's been no change to cabling or anything else. I'm worried that if I click start, I'll actually lose all the data on disk 2. If I fire up my tower with disk 2 disconnected, it says configuration valid and says disk 2 is "not installed". There's the option to click start however "Start will bring the array on-line (array will be unprotected)." When I click start, it looks like the drives are mounting, but then stops and goes back to the exact same screen. I was just going to try replacing disk 2 with a brand new drive, and do the rebuild again, but if anyone knows what's going on or whether it would be safe to click Start on this second stage without losing any data, I'd really appreciate an explanation or some help. Sorry for being a n00b, but I don't know how to post a syslog. I do know that I'm running version 4.5.4 though. Thanks in advance, Mike PS - I've done a quick search and found 1 or 2 threads that seemed similar, but I couldn't really follow what the solution was. http://lime-technology.com/forum/index.php?topic=15770.msg147005#msg147005 http://lime-technology.com/forum/index.php?topic=15088.0 Do NOT proceed without getting guidance from lime-technology. You were right to be cautious, a mistake could wipe out the parity you need to re-construct your failed drive. You most certainly do NOT want to start a parity sync. Joe L.
April 24, 201214 yr Agree with Joe. Do not start it with that rebuild parity showing. That option should not have appeared. The interface should have shown the disk was being rebuilt and that would have taken something like 8+ hours. Then, your array would have been working after. A syslog taken after you had checked the "yes, I want to rebuild the failed disk" and hit the start could have really helped. In fact, even if you're dealing with Limetech you probably will have to do those steps again and capture the syslog to gather some info to help with the issue. I suspect you might have a controller or SATA cable or power cord with issues feeding that disk or just a bad disk. However, I can't say for sure. I just suspect this because it's possible that unRAID failed to rebuild the disk and that's why it returned with the red ball beside the disk again and then gave the option to build parity. It was wrong to return with the option to build parity but still, the disk failing to rebuild could be the start of the problem.
April 25, 201214 yr Author Thanks guys! I'm not sweating bullets as I fortunately backed up everything from disk 2 onto another unRAID server about 3 weeks ago and there is no critical data that will be lost if disk 2 disappears. Yes - I'm that paranoid about data loss, I actually have a backup server in case my redundancy fails. If I can get away with starting the array and creating a new parity disk and then copying disk 2 back onto the tower from my other unRAID server, then that's fine with me. But I am worried that there might be an inherent instability in my main tower that I should be cautious of in case something like this were to happen again. I'm trying to educate myself about how to get a syslog and think that unMenu could be my best option. But installing that is even testing me at the moment... You guys say I should talk to Limetech. Do they check these boards? or is there another channel I should contact them on? Thanks again!
April 25, 201214 yr Yes, it would be best to figure out what happened before giving up. It might be a bug or hardware issue that needs to be addressed. Syslog - http://lime-technology.com/forum/index.php?topic=9880.0 Use email to contact Limetech. I'm not sure but I think a support email is given on their web site. [email protected] maybe?
April 25, 201214 yr Here's the contact info, if that helps: http://lime-technology.com/company/contact
April 26, 201214 yr Author Thanks for the info guys. I ran SMART tests on the original red-balled disk2, and it said everything was fine, so I pre-cleared the disk and then went to the browser management screen. But it still says that the disk is red-balled, even though it's been successfully tested and pre-cleared? I click start and it said all the disks were Mounting, but as soon as I clicked refresh it was back at the same screen saying "Start will bring the array on-line (array will be unprotected)." So I'm still confuzzled. PS - Thanks for the info on how to do a syslog. syslog.txt
April 26, 201214 yr I see this repeated a number of times; Apr 26 19:41:20 Tower kernel: ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Apr 26 19:41:20 Tower kernel: ata4.00: irq_stat 0x00020002, device error via SDB FIS Apr 26 19:41:20 Tower kernel: ata4.00: failed command: READ FPDMA QUEUED Apr 26 19:41:20 Tower kernel: ata4.00: cmd 60/c0:00:60:00:00/01:00:00:00:00/40 tag 0 ncq 229376 in Apr 26 19:41:20 Tower kernel: res 41/40:00:b8:00:00/d4:00:00:00:00/40 Emask 0x409 (media error) <F> Apr 26 19:41:20 Tower kernel: ata4.00: status: { DRDY ERR } Apr 26 19:41:20 Tower kernel: ata4.00: error: { UNC } Apr 26 19:41:20 Tower kernel: ata4.00: configured for UDMA/100 Apr 26 19:41:20 Tower kernel: ata4: EH complete Which I believe is pointing to an error getting this drive to initialize. This looks to be the drive connected to the 4th port on a 4-port add-in SATA card. I don't know if this is causing any real issues or not. It might be OK to leave alone for now. Then, I see this; Apr 26 19:41:20 Tower kernel: sd 3:0:0:0: [sde] Unhandled sense code Apr 26 19:41:20 Tower kernel: sd 3:0:0:0: [sde] Result: hostbyte=0x00 driverbyte=0x08 Apr 26 19:41:20 Tower kernel: sd 3:0:0:0: [sde] Sense Key : 0x3 [current] [descriptor] Apr 26 19:41:20 Tower kernel: Descriptor sense data with sense descriptors (in hex): Apr 26 19:41:20 Tower kernel: 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Apr 26 19:41:20 Tower kernel: 00 00 00 b8 Apr 26 19:41:20 Tower kernel: sd 3:0:0:0: [sde] ASC=0x11 ASCQ=0x4 Apr 26 19:41:20 Tower kernel: sd 3:0:0:0: [sde] CDB: cdb[0]=0x28: 28 00 00 00 00 60 00 01 c0 00 Apr 26 19:41:20 Tower kernel: end_request: I/O error, dev sde, sector 184 Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 23 Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 24 Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 25 Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 26 Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 27 Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 28 Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 29 Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 30 Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 31 Apr 26 19:41:20 Tower kernel: Buffer I/O error on device sde, logical block 32 It looks like something is not happy with the sde device (WDC_WD20EARS-00MVWB0_WD-WCAZA4877769). Then, this line; Apr 26 19:41:23 Tower emhttp: get_fstype: open /dev/sde1: No such file or directory Meaning that the partition on sde1 is missing or can't be accessed. Something is screwed up in the MBR of this disk. and then this line twice; Apr 26 19:42:10 Tower kernel: md: do_run: lock_rdev error: -6 This line has been associated with a bad partition/drive. Basically, if the partition is bad then the unRAID md driver code can't associate the partition with the md device which leads to it throwing this error. So, this leads to the quesions, which drive is sde? Hopefully, it is the replacement drive.
April 26, 201214 yr Thanks for the info guys. I ran SMART tests on the original red-balled disk2, and it said everything was fine, so I pre-cleared the disk and then went to the browser management screen. But it still says that the disk is red-balled, even though it's been successfully tested and pre-cleared? I click start and it said all the disks were Mounting, but as soon as I clicked refresh it was back at the same screen saying "Start will bring the array on-line (array will be unprotected)." So I'm still confuzzled. PS - Thanks for the info on how to do a syslog. Post the SMART report. Apr 26 19:41:20 Tower kernel: ata4.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0 Apr 26 19:41:20 Tower kernel: ata4.00: irq_stat 0x00020002, device error via SDB FIS Apr 26 19:41:20 Tower kernel: ata4.00: failed command: READ FPDMA QUEUED Apr 26 19:41:20 Tower kernel: ata4.00: cmd 60/c0:00:60:00:00/01:00:00:00:00/40 tag 0 ncq 229376 in Apr 26 19:41:20 Tower kernel: res 41/40:00:b8:00:00/d4:00:00:00:00/40 Emask 0x409 (media error) <F> Apr 26 19:41:20 Tower kernel: ata4.00: status: { DRDY ERR } Apr 26 19:41:20 Tower kernel: ata4.00: error: { UNC } Apr 26 19:41:20 Tower kernel: ata4.00: configured for UDMA/100 Apr 26 19:41:20 Tower kernel: ata4: EH complete This is a media error showing an UNCorrectable sector on the disk. The SMART report will reflect this error.
April 30, 201214 yr Author Hi guys, Thanks again for the feedback. Unfortunately sde is disk 5 so I think I've got a problem SMART report and syslog attached. smart_D5.txt smart_parity.txt syslog_120430.txt
April 30, 201214 yr ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 36 Error 228 occurred at disk power-on lifetime: 6899 hours (287 days + 11 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 bf 00 00 e0 Error: UNC 8 sectors at LBA = 0x000000bf = 191
April 30, 201214 yr Author Thanks mbryanr So I'm assuming I need to replace disk5 with a new one? Or is there some way I can repair it?
April 30, 201214 yr Do you need the data on disk2? It appears you now have 1 failed disk and another disk failing.
May 2, 201214 yr Author No - I have everything backed up on another server. I think I might take the cowards way out and just start from scratch. Oh well. Next time gadget.
May 2, 201214 yr Author Just tried restarting my array.... Can't rebuild parity, can't really do anything. If I replace both disk 2 and disk 5, should this problem be fixed? syslog_120502.txt
Archived
This topic is now archived and is closed to further replies.