What have I done! red dots everywere

gazz575 · May 25, 2010

I am really new to unraid. Have been trying it out for the last 2 weeks screwing up and rebuilding about 4 times now reading the forums and learning as I went along. I got confident enough and upgraded to pro version building an unraid with 6 sata drives including parity. All went well until I decided to add a 300gb ide drive to the array. When I started up I had parity green disks 1 and 2 red and disk 3 blue. Disks 4 and 5 are green along with the offending new ide drive.

I removed the ide drive from the array and restarted with no luck. I physically disconnected it from the motherboard still no luck. The tower main page displays too many wrong and/or missing disks even though they are still there in their same order. I have data on disks 1 and 2 and nothing on the others. How can I just get the unraid to start a parity check as no data should have changed. Its not giving me the option. I do not know anything about linux just sort of decided to jump in the deep end.

I thought I knew my way around computers now I think Im just a dumb ass

batfink · May 25, 2010

You may not have physically moved the disks but have you checked that they are still assigned to the original location in the devices page? Eg if disc 1 was "sda" before the new disc was added is it still assigned that way now? What happens if you swap the assignment of disc 1 & 2?

Be sure to not start the array or parity check before someone more knowledgable has replied!!!

Joe L. · May 25, 2010

I am really new to unraid. Have been trying it out for the last 2 weeks screwing up and rebuilding about 4 times now reading the forums and learning as I went along. I got confident enough and upgraded to pro version building an unraid with 6 sata drives including parity. All went well until I decided to add a 300gb ide drive to the array. When I started up I had parity green disks 1 and 2 red and disk 3 blue. Disks 4 and 5 are green along with the offending new ide drive.

I removed the ide drive from the array and restarted with no luck. I physically disconnected it from the motherboard still no luck. The tower main page displays too many wrong and/or missing disks even though they are still there in their same order. I have data on disks 1 and 2 and nothing on the others. How can I just get the unraid to start a parity check as no data should have changed. Its not giving me the option. I do not know anything about linux just sort of decided to jump in the deep end.

I thought I knew my way around computers now I think Im just a dumb ass

You could start by following the suggested technique given in the wiki under troubleshooting. Post a syslog.

Normally, I think what might have happened is that one of the two drives on the IDE cable was defective, affecting both drives on that cable, BUT, your original set of drives were SATA. It is possible your MB uses an IDE controller under it all, and pairs channels. I have no idea, since you did not post a syslog. It would show the pairings.

Both drives were taken off-line when writes to them failed. (Is it possible you dislodged the two SATA cables? Or you have a bad power splitter? Or a loose power splitter?

Once a drive is taken off-line because of a "write" failure, it will not be put back online without your administration of the failure.

Typically, you would replace a single failed disk with another and reconstruct the failed drive onto the replacement.

In your case, with two failed drives it is more complicated. You must first fix the fault.

You can use "smart" reports to determine if the basic cabling to a drive is working.

Step 1. Post a syslog, do it before you reboot.

Step 2. Make a copy of your "config" folder. It will let you get back to the state you are in now if needed.

You cannot just perform a parity "check" if you've changed the disk configuration. you can't perform a parity check if you have failed drives. You've done both.

This has nothing to to with knowing Linux. You could be an expert in Linux and be clueless on how to proceed in this situation.

Your basic issue are the two disks that are "red" You need to get one or the other working again.

So. Power down.

Check your cabling

Get "smart" reports on all the drives.

Post a syslog

Joe L.

gazz575 · May 25, 2010

Well then I guess Im screwed because I have already powered down the array. Mabey I should just remove the 2 drives with data, rebuild a new unraid with the others then move the data back onto those drives and add the removed drives afterward?

Joe L. · May 25, 2010

Well then I guess Im screwed because I have already powered down the array. Mabey I should just remove the 2 drives with data, rebuild a new unraid with the others then move the data back onto those drives and add the removed drives afterward?

Power up, capture the syslog. it is as good as anything in this situation.

Joe L.

gazz575 · May 25, 2010

Here is the syslog just in case, have checked all cables and seem fine, am presently burning the ide cable with thermite.

thanks

syslog.txt

Joe L. · May 25, 2010

It appears as if some of your drives are "paired" on simulated IDE controllers on the MB

May 25 21:46:45 Tower emhttp: pci-0000:00:1f.2-scsi-0:0:0:0 host5 (sdc) ST31000528AS_5VP50PDG

May 25 21:46:45 Tower emhttp: pci-0000:00:1f.2-scsi-0:0:1:0 host5 (sdd) SAMSUNG_HD753LJ_S13UJ1MQ328490

May 25 21:46:45 Tower emhttp: pci-0000:00:1f.2-scsi-1:0:0:0 host6 (sde) WDC_WD10EACS-00D6B0_WD-WCAU42375728

May 25 21:46:45 Tower emhttp: pci-0000:00:1f.2-scsi-1:0:1:0 host6 (sdf) SAMSUNG_HD501LJ_S0MUJ1MP892737

May 25 21:46:45 Tower emhttp: pci-0000:03:00.0-scsi-0:0:0:0 host0 (sda) ST3500320AS_9QM08JXA

May 25 21:46:45 Tower emhttp: pci-0000:03:00.0-scsi-1:0:0:0 host1 (sdb) ST3500320AS_9QM08L16

It is possible the IE disk shares a controller with either sda, or sdb. In any case, it appears as if all 6 of your currently connected drives are identifying themselves to the OS.

According to this, disk1 and disk2 are now connected to different disk controller ports than they were originally. Also according to this, disk3 has been replaced by a different disk. Did you perhaps get the connectors mixed up? It really does not matter though at this point. see below...

May 25 21:46:45 Tower kernel: md: import disk0: [8,32] (sdc) ST31000528AS 5VP50PDG offset: 63 size: 976762552

May 25 21:46:45 Tower kernel: md: import disk1: [8,64] (sde) WDC WD10EACS-00D WD-WCAU42375728 offset: 63 size: 976761496

May 25 21:46:45 Tower kernel: md: disk1 wrong

May 25 21:46:45 Tower kernel: md: import disk2: [8,48] (sdd) SAMSUNG HD753LJ S13UJ1MQ328490 offset: 63 size: 732573496

May 25 21:46:45 Tower kernel: md: disk2 wrong

May 25 21:46:45 Tower kernel: md: import disk3: [8,80] (sdf) SAMSUNG HD501LJ S0MUJ1MP892737 offset: 63 size: 488385496

May 25 21:46:45 Tower kernel: md: disk3 replaced

May 25 21:46:45 Tower kernel: md: import disk4: [8,0] (sda) ST3500320AS 9QM08JXA offset: 63 size: 488386552

May 25 21:46:45 Tower kernel: md: import disk5: [8,16] (sdb) ST3500320AS 9QM08L16 offset: 63 size: 488386552

To get back to a good starting point, I think you can reset the array configuration and then press "Start" to begin computing parity on the existing set of drives. Once parity is calculated you can then try again with your IDE drive.

To reset your array configuration log in on the system console or via telnet and type:

initconfig

Then on the web-management page, refresh it, and press the "Start" button to begin the process of calculating parity on the array.

Joe L.

Joe L. · May 25, 2010

The reason the two disks are showing as "wrong" is that your motherboard BIOS has apparently added an HPA to each changing its size. They are no longer the same size as when they were added to the array. This is really bad, since the partition on the disk is potentially affected and so is the formatting. (The file-system might still expect to be able to use those last few sectors stolen by the HPA)

You'll need to remove the HPA before those disks will work properly, but only after disabling the "feature" in the BIOS that is creating them.

May 25 21:46:45 Tower kernel: ata5: SATA max UDMA/133 cmd 0xd400 ctl 0xd800 bmdma 0xe400 irq 19

May 25 21:46:45 Tower kernel: ata6: SATA max UDMA/133 cmd 0xdc00 ctl 0xe000 bmdma 0xe408 irq 19

May 25 21:46:45 Tower kernel: ata5.00: ATA-8: ST31000528AS, CC3D, max UDMA/133

May 25 21:46:45 Tower kernel: ata5.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 0/32)

May 25 21:46:45 Tower kernel: ata5.01: HPA detected: current 1465147055, native 1465149168

May 25 21:46:45 Tower kernel: ata5.01: ATA-7: SAMSUNG HD753LJ, 1AA01109, max UDMA7

May 25 21:46:45 Tower kernel: ata5.01: 1465147055 sectors, multi 16: LBA48 NCQ (depth 0/32)

May 25 21:46:45 Tower kernel: ata5.00: configured for UDMA/133

May 25 21:46:45 Tower kernel: ata5.01: configured for UDMA/133

May 25 21:46:45 Tower kernel: scsi 5:0:0:0: Direct-Access ATA ST31000528AS CC3D PQ: 0 ANSI: 5

May 25 21:46:45 Tower kernel: scsi 5:0:1:0: Direct-Access ATA SAMSUNG HD753LJ 1AA0 PQ: 0 ANSI: 5

May 25 21:46:45 Tower kernel: sd 5:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)

May 25 21:46:45 Tower kernel: sd 5:0:0:0: [sdc] Write Protect is off

May 25 21:46:45 Tower kernel: sd 5:0:0:0: [sdc] Mode Sense: 00 3a 00 00

May 25 21:46:45 Tower kernel: sd 5:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

May 25 21:46:45 Tower kernel: sdc: sdc1

May 25 21:46:45 Tower kernel: sd 5:0:0:0: [sdc] Attached SCSI disk

May 25 21:46:45 Tower kernel: sd 5:0:1:0: [sdd] 1465147055 512-byte logical blocks: (750 GB/698 GiB)

May 25 21:46:45 Tower kernel: sd 5:0:1:0: [sdd] Write Protect is off

May 25 21:46:45 Tower kernel: sd 5:0:1:0: [sdd] Mode Sense: 00 3a 00 00

May 25 21:46:45 Tower kernel: sd 5:0:1:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

May 25 21:46:45 Tower kernel: sdd: sdd1

May 25 21:46:45 Tower kernel: sd 5:0:1:0: [sdd] Attached SCSI disk

May 25 21:46:45 Tower kernel: ata6.00: HPA detected: current 1953523055, native 1953525168

May 25 21:46:45 Tower kernel: ata6.00: ATA-8: WDC WD10EACS-00D6B0, 01.01A01, max UDMA/133

May 25 21:46:45 Tower kernel: ata6.00: 1953523055 sectors, multi 16: LBA48 NCQ (depth 0/32)

May 25 21:46:45 Tower kernel: ata6.01: HPA detected: current 976771055, native 976773168

May 25 21:46:45 Tower kernel: ata6.01: ATA-8: SAMSUNG HD501LJ, CR100-10, max UDMA7

May 25 21:46:45 Tower kernel: ata6.01: 976771055 sectors, multi 16: LBA48 NCQ (depth 0/32)

May 25 21:46:45 Tower kernel: ata6.00: configured for UDMA/133

May 25 21:46:45 Tower kernel: ata6.01: configured for UDMA/133

May 25 21:46:45 Tower kernel: scsi 6:0:0:0: Direct-Access ATA WDC WD10EACS-00D 01.0 PQ: 0 ANSI: 5

This is a HUGE problem, and you should look in the wiki to learn about the HPA issue and at the least disable the "feature" in your Gigabyte BIOS. That feature, if enabled in the BIOS by default and disabled by you, is a ticking time-bomb. When the CMOS battery dies, the feature is re-enabled, it could corrupt the file-systems, or, as in this case, take multiple disks off-line. You really need to update the MB BIOS (or the motherboard itself) to one where the feature is disabled by default.

See here: http://lime-technology.com/wiki/index.php?title=UnRAID_Topical_Index#HPA

Joe L.

gazz575 · May 25, 2010

Oh thats just awesome, Ive gone into the bios and cant find any setting to disable HPA, have tried the ctrl-f4 for more options but cant see any. Have just downloaded latest bios for mb trying to find a floppy drive to plug into the board. Ive read a few of the issues on HPA so does this mean that the affected drives can still be recovered?

thanks again for taking the time to answer my call for help

Rajahal · May 25, 2010

The HPA feature is usually called something like 'Save a copy of BIOS to the hard drive'. Look for that and disable it.

Here is the long but thorough explanation as to how to remove HPA partitions from your drives:

http://lime-technology.com/forum/index.php?topic=5072.msg46903#msg46903

Don't do this yet, but you may have to do it at some point in the future.

You should be able to recover the data on the drives by using some other OS that understands the Reiser file system, but there is likely some data loss on certain files near the beginning of the disk.

I'll turn this back over to Joe now.

gazz575 · May 26, 2010

Thanks for the info just got back from work will try to fix this now

gazz575 · May 29, 2010

Hi all, have didnt have time to try fixing during the week but have now swapped out motherboards to another gigabyte with the option of disabling hpa writing. Tower now showing 2 parity drive and disk 1 with red dots. I ran the hdparm command in accordance with the forum link substituting my drives native size which seems to have worked but Unraid is expecting the smaller hpa size. I then ran the initconfig command which brought all the except parity to green and have started a parity check. All seems to be looking well for now. Will post an update with syslog when partiy is completed.

thanks Joe and Rajahal

What have I done! red dots everywere

Recommended Posts

gazz575

Link to comment

batfink

Link to comment

Joe L.

Link to comment

gazz575

Link to comment

Joe L.

Link to comment

gazz575

Link to comment

Joe L.

Link to comment

Joe L.

Link to comment

gazz575

Link to comment

Rajahal

Link to comment

gazz575

Link to comment

gazz575

Link to comment

Join the conversation