Beardmann

Members
  • Posts

    6
  • Joined

  • Last visited

Everything posted by Beardmann

  1. The NetApp shelfs are rock solid... If the DS4246 is used, you normally do not need four PSUs installed, unless you use 10k RPM drives... so a PSU in the left upper corner and the lower right corner is now they are shipped from NetApp with SATA or NL-SAS 7.2k drives. Also in a Linux setup with a SAS HBA you will inly need to install the upper IOM6 module, just pull out the lower one (waste of power)... You want to connect your HBA to the port with the square box next to it... and if you want to loop in another shelf, you connect from circle to square port... (that is how NetApp does it... not 100% sure if makes any difference on a SAS HBA...) And also you might want to give the shelfs separate IDs on the front... (hold down the button for a few secs, and as it blinks, you should be able to change one of the digits, then hold it again to change the other digit.. and then one last time and it saves the ID.. and starts to blink amber... then power cycle the shelf... If you have the money, you might want to replace the IOM6 module with IOM12 (right now they are about $1.000 a pcs)... and of cause you will then need a HBA that can do 12G, new MiniSAS HD cables and possibly some SAS disks that can do 12G... I can only say nice things about these shelfs... I have customers who have 8+ years old shelfs still going strong... disks, psus and the odd IOM module have been replaced along the years, but other than disks, the other components rarely dies... Good luck...
  2. Hi there, I know your question is kinda old, but since I am working with NetApp gear on a daily basis I think I can help you a bit You are correct that amber is a caused because of an error of some kind. It can be abler on the disks, on the shelf (at the front), on the PSU's at the back, and on the IOM modules... ๐Ÿ™‚ If I were you I would start out with two PSU's (for the DS4246) you only need four PSUs if you are using SAS 10k drives... NetApp ships them with SATA/NL-SAS 7.2k drives and two PSUs.... Also only use one of the IOM6 modules (to top one), remove the lower IOM module, you can only use this if all your drives are SAS drives as they are dual-ported, which SATA drives are not. (NetApp use a little converter board which converts SATA to SAS)... but basically forget about getting dual-path working with anything else than a NetApp controller If you have to connect more shelfs together you have to connect them to the right ports... on the IOM6 modules you have a port with a circle and one with a square... You want to connect your HBA to the first shelf using the square port, then connect the circle port from that shelf to the square port on the other shelf you want to link... And yes the shelfs have to have different IDs... yet I'm not sure if anything else than a NetApp controller cares about that... and even with a loop with same ID shelfs, a NetApp controller can see all disks, they will just be numbered with a strange ID... Power the shelfs on... id anything lights up amber now, identify which amber light it is... Amber disk = Faulty disk (duh!) Amber PSU = No power, not turned on, or faulty PSU (replace it) Amber IOM module = Most likely faulty IOM module, or possibly faulty cabeling (check that no circle is connected to another circle on the other shelf, and the same with the square port...) if it is still amber, replace the IOM module. Amber shelf = This can be caused but the temperature sensors, faulty fans, or faulty voltages You can check some of the issues above with the sg3-utils package for Linux... I cannot remember the commands, and it is kinda hairy... so if you have an old NetApp Controller handy, it will be much easier to troubleshoot it by just booting it into maint. mode and issuing a "environment status" command and it should tell you what's wrong ๐Ÿ™‚ The DS shelfs from NetApp are in general very stable! I have customers that have shelfs that are 7-8 years old now and still going strong... we have replaced disks and the odd PSU and IOM module, but never a whole shelf.... in my 15+ years working with NetApp gear I have replaced a whole shelf two or three times... one of which NetApp support wasn't even sure what was wrong, and they just replaced it to be sure... I hope you figure it out... Another thing you might want to know, is that it is that as it was possible to upgrade your shelf from IOM3 modules to IOM6, it is also possible to upgrade to IOM12... Of cause it only makes sense if you have SAS drives capable of 12G speeds... and of cause a HBA that can do 12G... (this all uses different cables too... MiniSAS HD compared to QSFP for 3G and 6G)... (and remember that it is till Quad 12G per cable which confuses some people) so there are four SAS lanes in the shelf. Only problem with the IOM12, is that it is pricy at the moment... (about $1.000 used) Good luck ๐Ÿ™‚ /B
  3. Hi there I also have an X520 dual NIC in my server. I have only one port connected and I am able to see just one of the two 10G ports... I was wondering if this is by design? So if nothing is connected to the NIC port, the driver is just not loaded? Because... then why is this not the case with the internal 1G NICs in my server? Here I have one connected and another one which is not, yet both are identified as "eth0+1"... A bit strange to me lspci shows: 04:00.0 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter (rev 01) 04:00.1 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter (rev 01) and 07:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03) 08:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03) But I only have three devices eth0, eth1 (1G) and eth2 (10G)... but no eth3 /B
  4. Hi trurl Sorry for not uploading any diag... as mentioned this is a test system, so not at all critical. The system is rebuilding yes, but I had to replace the physical disk with a disk that the system has not seen, in order to get it to work. So I guess my question was if unraid keeps track of the physical disks it has seen? Because no matter what I tried (from the GUI) I was unable to re-introduce the same disk as parity again... even after reboots... and on another system I tested the "failed" parity disk with an extended SMART test which showed no errors... I do not entirely agree with you on the "failed connections" being more common than failed disks... maybe with consumer grade hardware, but this is enterprise gear all the way, and I have been working with NetApp gear for 15+ years and I am yet to see a copper SAS cable fail once working... ๐Ÿ™‚ Of cause laser optics/cables and SFP modules can fail.. but copper SAS cables not so much ๐Ÿ™‚ but let's not get bogged down in semantics Just a quick note... If I do a "cat /proc/mdstat" I can see that the last entries states this: diskName.29= diskSize.29=2930266532 diskState.29=6 diskId.29=ST33000650NS_SA_Z294PYBV_350000c9000354a4c rdevNumber.29=29 rdevStatus.29=DISK_INVALID rdevName.29=sdg rdevOffset.29=64 rdevSize.29=2930266532 rdevId.29=ST33000650NS_SA_Z294PYBV_350000c9000354a4c rdevReads.29=0 rdevWrites.29=122410131 rdevNumErrors.29=0 Which to me suggests that unraid keeps track of the failed disk? Any way to make it forget this? /B
  5. Hi there I am currently just testing a future setup with a server connected to a NetApp DS4246 shelf with SAS drives installed. I created a simple setup with two parity disks and 4 data disks, which works... I then tried to pull one of the data disks to simulate a failure (the shelf is hot plug)... It took a while before the system noticed the disk was missing (maybe because I didn't have any load on the system).. But not just the data disk failed, but also one of the two parity disks was marked as failed which I of cause find worrying... I then tried to reseat the parity disk, but it remained failed. I even tried to stop the array, remove the party disk from the array, clear the disk (from the GUI) as much as I could, but no matter what I did, I was unable to add it back in its original location... i just remained "disabled"... as if it knew it was the same disk somehow? I even tried to reboot the server and try it all again, but no dice... I then replaced the disk with a new one, and this worked just fine, and the parity is not rebuilding... Can anyone please explain why I was unable to reuse the "failed" disk? Because it is not failed, it works just fine, and I was able to do a complete SMART check with no errors... Does Unraid save the unique SN of the disk is has seen, and just flat out deny you the possibility to reuse it again? I am also a bit worried about the fact that pulling a disk was contributing to another one failing... not sure how this is even possible... Sadly as I rebooted, I also lost the /var/log/messages to see what happened... (I will of cause try this again once the resync has completed) ...and no there is no problems with my HBA or the SAS cable I use to connect to the DS4246 shelf... I am quite sure about this because as it rebuilds now, there are no errors in the logs from the HBA or any of the disks... As explained this is just a test setup with 3TB disks... the final setup will be with 18TB disks, and failures like this will take forever to rebuild (with 3TB disks I'm looking at 5-6 hours, which I can then multiply with 6) ๐Ÿ™‚ Any help is appreciated. /B