Jump to content

Dual Parity Drives error and replacing / flashing HBA


juanamingo

Recommended Posts

Posted

Hello all,

 

Hoping to get some definitive answers / advice for my issue.  <tl/dr> is at the end

 

System specs:

Quote

 

MBD  GA-X99P-SLI

CPU   Xeon E5-2640 V3

RAM 128 GB Corsair Vengeance DDR4 

NIC   Intel X540T2 10GB

VID   Zotac Ge-Force GT-610

HBA  RocketRAID 2760A

HDD 13 x 10TB IronWolf PRO

SSD  2 x 2TB 850 EVO

CSE  SuperMicro 24 Bay 4U

PSU  2xSuperMicro 920W

 

 

I'm currently running a single RocketRAID 2760A 6 port card (worked out of the box - there's a setting, but i forget which at the moment, in MBD BIOS which skips the cards BIOS screen) which is connected to the back plane of my SuperMicro CSE-846TQ-R900B using all 6 of the ports on the RocketRaid.

 

Don't think it matters, but for completion-sake - and to illustrate some of the troubleshooting steps I've taken -  I had connected the breakout cables from the RocketRaid to the back plane top to bottom, left to right like

C0PO C1P2 C3P0 C4P2

C0P1 C1P3 C3P1 C4P3

C0P2 C2P0 C3P2 C5P0

C0P3 C2P1 C3P3 C5P1

C1P0 C2P2 C4P0 C5P2

C1P1 C2P3 C4P1 C5P3

 

using these cables, where C is cable, P is port

 

My array consists of 8 x 10 TB Seagate IronWolf Pros for data and 2 x 10 TB IronWolf Pros as parity, plus 2 x 2TB Samsung 850 EVOs as cache drives.

There are also 3 x 10 TB IronWolf Pro's that are pre-cleared and awaiting use.

 

The issue i'm having is this, and it ALWAYS has ONLY happened with the parity drives, never a data drive (I so hope i didn't just jinx myself).

 

After my first install of unRAID in ~ Sept '17, everything was perfect.  After about 10 days of uptime, Parity dropped due to errors (about 1200 +/- on Parity, Parity 2 was ok).

 

I stopped the array, removed and re-added the drive and re-ran parity check.  Parity check ran at about 200 - 250 MB/s (when all dockers stopped and no load on the system), and took about 16 hours.

 

It went fine for about a week, then errored again (about 1200 +/- on Parity, Parity 2 was ok). 

 

This time i removed the drive from the array and pre-cleared it (all stages) with no errors.

I left the array with just Parity 2 for about a week and no issues.

 

Then stopped the array, re-added Parity and re-ran the parity check.

 

All was well for another week or two and Parity dropped again (about 1200 +/-, Parity 2 was ok).

 

At this time I shut down the server, disconnected all of the drive connections from the backplane, and re-connected, disconnected all the connections from the RocketRAID and re-connected, and plugged the RocketRAID into another PCIe slot.

 

Again, all went well for a week or two and this time Parity2 errored (again, about 1200 +/- and Parity was ok).

 

I shut the server down again and changed out cables to these and at the same time re-ordered the way they were plugged into the backplane like this

C0P0 C0P1 C0P2 C0P3

C1P0 C1P1 C1P2 C1P3

C2P0 C2P1 C2P2 C2P3

C3P0 C3P1 C3P2 C3P3

C4P0 C4P1 C4P2 C4P3

C5P0 C5P1 C5P2 C5P3

 

making sure the drives were plugged into the same CXPY as before.

 

Up until this point i had always used the same Parity drive after pre-clearing it when re-running parity check.

Now I had also acquired the 3 extra 10TB drives at this time and pre-cleared one, and when it was done set it as Parity and re-ran the parity check.

 

So now i've changed the card PCIe slot, changed the cables, changed the physical location in the backplane and changed a drive.

 

In the meantime, the original Parity drive passed pre-clear again.

 

Now a week or so later the replacement Parity drive has errored (this time 1 error and Parity2 is ok).

 

From the disk log of the disabled parity drive:

Feb 3 08:24:02 Guardian kernel: print_req_error: I/O error, dev sdm, sector 0
Feb 3 08:24:02 Guardian kernel: sd 2:0:7:0: [sdm] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Feb 3 08:24:02 Guardian kernel: sd 2:0:7:0: [sdm] tag#0 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00
Feb 3 08:24:02 Guardian kernel: print_req_error: I/O error, dev sdm, sector 0
Feb 3 08:24:02 Guardian kernel: Buffer I/O error on dev sdm, logical block 0, async page read

The pattern, which as I'm writing this became 'apparent', is that every 10 days or so it would throw a parity error. In hindsight, I probably should have caught that sooner and rebooted every 7 days to see if that mitigated the issue, BUT I don't see that as a real solution.

 

Now, in-between all of the above I had been doing some reading on the forums here and also ran the CA Common Problems plugin.

 

Seeing the 'Marvell Hard Drive Controller installed' warning led me to some articles about that controller causing issues with random drives dropping.

 

I also saw that LSI controllers were preferred and after doing some reading I decided on getting 3 x LSI SAS 9211-8i HBA with integrated RAID to replace the RocketRaid.

 

The box describes it as:

 

Quote

LSI SAS 9211-8i

6Gb/s 80Port SAS/SATA

Host Bus Adapter

with Integrated Raid

 

LSI SATA+SAS Host Bus Adapter

 

Manufacture date of 5/2016

 

 

So now I have those 3 LSI cards, and from what I've read, a HBA does not need to be flashed and works out of the box.

 

BUT after searching for 9211-8i i found this post 

Quote
LSI SAS 9211-8i 8 PCIe x8 SATA III SAS2008 flashed to IT mode [49]

 

which tells me that it needs to be flashed to IT mode using this

 

Quote

LSI SAS2008 chipset

 

1) LSI SAS9211-8i

 

2) SuperMicro X8SI6-F with onboard SAS2008 controller

 

Please let me know (PM or post here) what Card you have successfully flashed with the LSI SAS2008 chipset with the provided zip.

 

FW: 10.00.02.00 / BIOS: 7.19.00.00 / 15-JUN-11 (LSI P10)

Both IT/IR mode available.

 

LSI SAS2008 Controllers(P10).zip - 2.59 MB (Windows)

After you expand the zip file, please read the file "__ReadMeFirst.txt" before doing anything!

LSI SAS2008 Controllers(P10)Linux.zip - 3.43 MB (unRAID)

 

FW: 11.00.00.00 / BIOS: 7.21.00.00 / 22-AUG-11 (LSI P11)

Both IT/IR mode available.

 

LSI SAS2008 Controllers(P11).rar - 2.21 MB (Windows)

After you expand the zip file, please read the file "__ReadMeFirst.txt" before doing anything!

LSI SAS2008 Controllers(P11)Linux.rar - 3.16 MB (unRAID

 

 

<tl/dr>

On a dual parity system with a Marvel controller based card (RocketRAID 2760A), every +/- 10 days the Parity (or possibly Parity 2) drive will throw an error (or 1200) and be disabled.

Never had both parity drives down at the same time, nor a data drive down.

Pre-clearing the drive (all stages) succeeds, and re-running the parity build succeeds with no errors. 

Tried changing PCIe slots for the card, changed cables, changed ports on the back plane, changed Parity drive (new, pre-cleared drive, different back plane location & port on card).

Bought 3 new LSI 9211-8i cards and intend to replace the 2760A

</tl/dr>

 

Now to my questions:

 

1) Does the 9211-8i HBA with integrated RAID need need to be flashed?

If so, do all three cards need to be flashed?

 

If the answer to #1 is yes,

2) The FW listed above is from 2011, more than 6 years old - will it support 10TB drives?

Is there a newer one? (there's currently 55+ pages in that thread and i didn't easily find another entry for a newer FW )

 

If I missed anything pertinent, let me know.

 

Thanks in advance.

 

Posted
21 minutes ago, juanamingo said:

1) Does the 9211-8i HBA with integrated RAID need need to be flashed?

If so, do all three cards need to be flashed?

These controllers can be in RAID or IT mode, If they are in RAID mode they need to be flashed to IT mode.

 

22 minutes ago, juanamingo said:

Is there a newer one?

Yes, latest firmware is p20.00.07, you can downloaded it from Broadcom's support site, it's in the legacy section.

Posted
50 minutes ago, johnnie.black said:

These controllers can be in RAID or IT mode, If they are in RAID mode they need to be flashed to IT mode.

 

Yes, latest firmware is p20.00.07, you can downloaded it from Broadcom's support site, it's in the legacy section.

 

Thank you!

 

I'm assuming the only way to know if it's RAID or IT mode is to plug it in and see if it loads a RAID bios?

 

Maybe i'm better off just flashing all 3 with p20.00.07 to be sure.

 

Posted
54 minutes ago, johnnie.black said:

Yes, latest firmware is p20.00.07, you can downloaded it from Broadcom's support site, it's in the legacy section.

 

(For reference) From Broadcom's site:

 

Group: Legacy Products, Family: Legacy Host Bus Adapters, OEM: , Product: SAS 9211-8i Host Bus Adapter, Asset type:Firmware Keyword:

 

9211-8i_Package_P20_IR_IT_Firmware_BIOS_for_MSDOS_Windows

Package_P20_Firmware_BIOS_for_MSDOS_Windows
Version: 20.00.07.00
File Size: 1700 KB
Language: English
 
 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...