juanamingo Posted February 3, 2018 Posted February 3, 2018 Hello all, Hoping to get some definitive answers / advice for my issue. <tl/dr> is at the end System specs: Quote MBD GA-X99P-SLI CPU Xeon E5-2640 V3 RAM 128 GB Corsair Vengeance DDR4 NIC Intel X540T2 10GB VID Zotac Ge-Force GT-610 HBA RocketRAID 2760A HDD 13 x 10TB IronWolf PRO SSD 2 x 2TB 850 EVO CSE SuperMicro 24 Bay 4U PSU 2xSuperMicro 920W I'm currently running a single RocketRAID 2760A 6 port card (worked out of the box - there's a setting, but i forget which at the moment, in MBD BIOS which skips the cards BIOS screen) which is connected to the back plane of my SuperMicro CSE-846TQ-R900B using all 6 of the ports on the RocketRaid. Don't think it matters, but for completion-sake - and to illustrate some of the troubleshooting steps I've taken - I had connected the breakout cables from the RocketRaid to the back plane top to bottom, left to right like C0PO C1P2 C3P0 C4P2 C0P1 C1P3 C3P1 C4P3 C0P2 C2P0 C3P2 C5P0 C0P3 C2P1 C3P3 C5P1 C1P0 C2P2 C4P0 C5P2 C1P1 C2P3 C4P1 C5P3 using these cables, where C is cable, P is port My array consists of 8 x 10 TB Seagate IronWolf Pros for data and 2 x 10 TB IronWolf Pros as parity, plus 2 x 2TB Samsung 850 EVOs as cache drives. There are also 3 x 10 TB IronWolf Pro's that are pre-cleared and awaiting use. The issue i'm having is this, and it ALWAYS has ONLY happened with the parity drives, never a data drive (I so hope i didn't just jinx myself). After my first install of unRAID in ~ Sept '17, everything was perfect. After about 10 days of uptime, Parity dropped due to errors (about 1200 +/- on Parity, Parity 2 was ok). I stopped the array, removed and re-added the drive and re-ran parity check. Parity check ran at about 200 - 250 MB/s (when all dockers stopped and no load on the system), and took about 16 hours. It went fine for about a week, then errored again (about 1200 +/- on Parity, Parity 2 was ok). This time i removed the drive from the array and pre-cleared it (all stages) with no errors. I left the array with just Parity 2 for about a week and no issues. Then stopped the array, re-added Parity and re-ran the parity check. All was well for another week or two and Parity dropped again (about 1200 +/-, Parity 2 was ok). At this time I shut down the server, disconnected all of the drive connections from the backplane, and re-connected, disconnected all the connections from the RocketRAID and re-connected, and plugged the RocketRAID into another PCIe slot. Again, all went well for a week or two and this time Parity2 errored (again, about 1200 +/- and Parity was ok). I shut the server down again and changed out cables to these and at the same time re-ordered the way they were plugged into the backplane like this C0P0 C0P1 C0P2 C0P3 C1P0 C1P1 C1P2 C1P3 C2P0 C2P1 C2P2 C2P3 C3P0 C3P1 C3P2 C3P3 C4P0 C4P1 C4P2 C4P3 C5P0 C5P1 C5P2 C5P3 making sure the drives were plugged into the same CXPY as before. Up until this point i had always used the same Parity drive after pre-clearing it when re-running parity check. Now I had also acquired the 3 extra 10TB drives at this time and pre-cleared one, and when it was done set it as Parity and re-ran the parity check. So now i've changed the card PCIe slot, changed the cables, changed the physical location in the backplane and changed a drive. In the meantime, the original Parity drive passed pre-clear again. Now a week or so later the replacement Parity drive has errored (this time 1 error and Parity2 is ok). From the disk log of the disabled parity drive: Feb 3 08:24:02 Guardian kernel: print_req_error: I/O error, dev sdm, sector 0 Feb 3 08:24:02 Guardian kernel: sd 2:0:7:0: [sdm] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 Feb 3 08:24:02 Guardian kernel: sd 2:0:7:0: [sdm] tag#0 CDB: opcode=0x88 88 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 Feb 3 08:24:02 Guardian kernel: print_req_error: I/O error, dev sdm, sector 0 Feb 3 08:24:02 Guardian kernel: Buffer I/O error on dev sdm, logical block 0, async page read The pattern, which as I'm writing this became 'apparent', is that every 10 days or so it would throw a parity error. In hindsight, I probably should have caught that sooner and rebooted every 7 days to see if that mitigated the issue, BUT I don't see that as a real solution. Now, in-between all of the above I had been doing some reading on the forums here and also ran the CA Common Problems plugin. Seeing the 'Marvell Hard Drive Controller installed' warning led me to some articles about that controller causing issues with random drives dropping. I also saw that LSI controllers were preferred and after doing some reading I decided on getting 3 x LSI SAS 9211-8i HBA with integrated RAID to replace the RocketRaid. The box describes it as: Quote LSI SAS 9211-8i 6Gb/s 80Port SAS/SATA Host Bus Adapter with Integrated Raid LSI SATA+SAS Host Bus Adapter Manufacture date of 5/2016 So now I have those 3 LSI cards, and from what I've read, a HBA does not need to be flashed and works out of the box. BUT after searching for 9211-8i i found this post Quote LSI SAS 9211-8i 8 PCIe x8 SATA III SAS2008 flashed to IT mode [49] which tells me that it needs to be flashed to IT mode using this Quote LSI SAS2008 chipset 1) LSI SAS9211-8i 2) SuperMicro X8SI6-F with onboard SAS2008 controller Please let me know (PM or post here) what Card you have successfully flashed with the LSI SAS2008 chipset with the provided zip. FW: 10.00.02.00 / BIOS: 7.19.00.00 / 15-JUN-11 (LSI P10) Both IT/IR mode available. LSI SAS2008 Controllers(P10).zip - 2.59 MB (Windows) After you expand the zip file, please read the file "__ReadMeFirst.txt" before doing anything! LSI SAS2008 Controllers(P10)Linux.zip - 3.43 MB (unRAID) FW: 11.00.00.00 / BIOS: 7.21.00.00 / 22-AUG-11 (LSI P11) Both IT/IR mode available. LSI SAS2008 Controllers(P11).rar - 2.21 MB (Windows) After you expand the zip file, please read the file "__ReadMeFirst.txt" before doing anything! LSI SAS2008 Controllers(P11)Linux.rar - 3.16 MB (unRAID <tl/dr> On a dual parity system with a Marvel controller based card (RocketRAID 2760A), every +/- 10 days the Parity (or possibly Parity 2) drive will throw an error (or 1200) and be disabled. Never had both parity drives down at the same time, nor a data drive down. Pre-clearing the drive (all stages) succeeds, and re-running the parity build succeeds with no errors. Tried changing PCIe slots for the card, changed cables, changed ports on the back plane, changed Parity drive (new, pre-cleared drive, different back plane location & port on card). Bought 3 new LSI 9211-8i cards and intend to replace the 2760A </tl/dr> Now to my questions: 1) Does the 9211-8i HBA with integrated RAID need need to be flashed? If so, do all three cards need to be flashed? If the answer to #1 is yes, 2) The FW listed above is from 2011, more than 6 years old - will it support 10TB drives? Is there a newer one? (there's currently 55+ pages in that thread and i didn't easily find another entry for a newer FW ) If I missed anything pertinent, let me know. Thanks in advance.
JorgeB Posted February 3, 2018 Posted February 3, 2018 21 minutes ago, juanamingo said: 1) Does the 9211-8i HBA with integrated RAID need need to be flashed? If so, do all three cards need to be flashed? These controllers can be in RAID or IT mode, If they are in RAID mode they need to be flashed to IT mode. 22 minutes ago, juanamingo said: Is there a newer one? Yes, latest firmware is p20.00.07, you can downloaded it from Broadcom's support site, it's in the legacy section.
juanamingo Posted February 3, 2018 Author Posted February 3, 2018 50 minutes ago, johnnie.black said: These controllers can be in RAID or IT mode, If they are in RAID mode they need to be flashed to IT mode. Yes, latest firmware is p20.00.07, you can downloaded it from Broadcom's support site, it's in the legacy section. Thank you! I'm assuming the only way to know if it's RAID or IT mode is to plug it in and see if it loads a RAID bios? Maybe i'm better off just flashing all 3 with p20.00.07 to be sure.
juanamingo Posted February 3, 2018 Author Posted February 3, 2018 54 minutes ago, johnnie.black said: Yes, latest firmware is p20.00.07, you can downloaded it from Broadcom's support site, it's in the legacy section. (For reference) From Broadcom's site: Group: Legacy Products, Family: Legacy Host Bus Adapters, OEM: , Product: SAS 9211-8i Host Bus Adapter, Asset type:Firmware Keyword: 9211-8i_Package_P20_IR_IT_Firmware_BIOS_for_MSDOS_Windows Package_P20_Firmware_BIOS_for_MSDOS_Windows Version: 20.00.07.00 File Size: 1700 KB Language: English
Recommended Posts
Archived
This topic is now archived and is closed to further replies.