aim60

Members
  • Posts

    139
  • Joined

  • Last visited

  • Days Won

    1

Posts posted by aim60

  1. The server has been rock solid for almost a year.  Not one sync error.

    Added a 2TB EARS drive & ran 3 Preclear passes.

    Replaced a 1.5 ST31500341AS data drive with the EARS drive.  No errors during the rebuild.  Started a parity check when the rebuild completed and immediately got 3 sync errors.  The check completed without any additional errors.

     

    Please advise.

     

    Unraid 4.5 Final, Coolermaster CM590, Supermicro C2SEE, Intel Celeron E1400, Crucial 4GB (2x2GB), Corsair 450VX

     

     

    Syslog.zip

  2. I screwed 3 1/2 to 5 1/4 adapter brackets to the hard drives, and mounted them 1 per bay in a cm590.  Not as convenient as hot swap bays, but I can remove one drive at a time without disturbing the others.  And being farther apart, they get good air flow without front fans.  Am using 2 140mm top fans & 1 120 rear fan, and all of the drives stay reasonably cool.  5400/5900 rpm drives run a lot cooler than 7200s.  I don't plan on running out of drive bays any time soon.

  3. I upgraded my Unraid to the 4.5 final. I was on 4.5 beta6.

    When I restarted the machine, the array did not restart automatically.

    So I went to the web page and it said upgraded disk. I started the array and it is rebuilding my disk1 drive in the array.

    I did not make any hardware changes.

     

    Can someone tell me why it would do this?

     

    A very interesting action took place here, that caused a resize of your Disk 1, which resulted in the disk being inconsistent with the disk config in unRAID, caused a rewrite of the MBR, then a rebuild of the disk, which was perhaps unnecessary.  Your Disk 1 had a Gigabyte HPA installed, and is connected to the first SATA port on the motherboard.  No other drives have HPA's.  Here are the relevant lines:

    Dec 17 16:36:25 Tower kernel: ata7: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

    Dec 17 16:36:25 Tower kernel: ata7.00: HPA unlocked: 1953523055 -> 1953525168, native 1953525168

    Dec 17 16:36:25 Tower kernel: ata7.00: ATA-8: WDC WD10EADS-00M2B0, 01.00A01, max UDMA/133

     

    Dec 17 16:36:25 Tower kernel: md: import disk1: [8,96] (sdg) WDC WD10EADS-00M2B0                           WD-WMAV50195436 offset: 63 size: 976762552

    Dec 17 16:36:25 Tower kernel: md: disk1 wrong

     

    Dec 17 16:37:16 Tower emhttp: writing mbr on disk 1 (/dev/sdg)

    Dec 17 16:37:16 Tower emhttp: re-reading /dev/sdg partition table

    Dec 17 16:37:16 Tower kernel:  sdg: sdg1

    Dec 17 16:37:17 Tower kernel: mdcmd (31): start UPGRADE_DISK

     

    Dec 17 16:37:17 Tower kernel: md: recovery thread rebuilding disk1 ...

     

    Dec 17 16:37:19 Tower emhttp: resized: /mnt/disk1

     

    There is some concern here, because we really don't know yet what actually happened, and therefore don't know yet what will happen when this rebuild reaches the last megabyte of this drive.  If the size change is artificial, that is, the kernel is saying that this *should* be the true size, but the hard drive firmware has not truly removed the HPA, then there are going to be drive errors at the end of the rebuild, when the drive refuses writes to that area.  If this latest kernel now includes logic to actually remove the HPA *AND* make the Gigabyte board turn off this "BIOS backup in an HPA" feature, then this is a great new feature of the kernel, and the rebuild should write zeroes into that area, clearing it.  I have to wonder though if this is going to stop the Gigabyte BIOS from trying to create an HPA again on the *next* boot.  It will be good to hear from other Gigabyte board owners with HPA's.  What is especially interesting here, is what happens at the end of this drive, and what happens on the next boot.

     

    There is another possibility, did you perhaps find a BIOS setting that disabled this feature, and just changed it now?  Perhaps the new kernel detects that and tries to recover the space.

     

    Just to be clear, there is and was nothing wrong with the drive, but the kernel has attempted to remove the HPA, which changes the size of the drive, and that makes unRAID think the drive has changed.

     

    I feel I need to caution you here, as to the action you took, especially so quickly.  Any time that the Web Management indicates an action or status of a drive that is not in accord with our understanding of that drive, you really should step back and try to find out what happened first, before proceeding.  When it said that the drive needed to be rebuilt, this in effect was similar to it saying that the drive needs formatting, and you would not want to proceed very quickly if you unexpectedly saw that message.  A request to rebuild is effectively asking to completely overwrite a drive, in effect losing everything that was stored there (although we hope it will overwrite with what is already there).  The first step to take is to check the Device assignments, to make sure that the new kernel has not changed the order of drive detection, and now a different drive is assigned there.  I don't think we have had a catastrophic case like that yet, rather, device changes have simply resulted in unassigned drives, but still if the parity drive had somehow been assigned now as Disk 1, it could have resulted in the complete loss of Disk 1.  I would want to make absolutely sure that the Disk 1 I will overwrite with the contents of Disk 1, is really the correct drive and serial number.  After that, I would want some idea of why it is trying to overwrite this disk.  It could be valid, or not, and I would very much want to know if it should not be overwritten/rebuilt.

     

    In this case, after verifying the drive assignments, all you needed to do was run the Trust My Array procedure.  It would have reported a number of parity errors at the very end of the drive, but that is expected.

     

    Being a Linux n00b, don't know whether this affects the unRaid kernel:

      Kernel Bug - Do NOT disable HPA by default -> leads to data loss

      https://bugs.launchpad.net/ubuntu/+source/linux/+bug/380138

  4. Have been burning in (2) ST32000542AS 5900RPM 2TB drives, by running 2 copies of Preclear simultaneously.  With version .9.8, each pass takes 30 hours!

     

    I manually spindown the array nightly from the standard web interface.  During the 4th pass, pressing the Spin Down button caused the web interface to become completely unresponsive.  Couldn't even launch the web interface from another PC.  UnMenu worked fine, and I could access files files via the disk shares or user shares.

     

    Gave up for the night.  In the morning, all was well.  The syslog showed a 24 minute delay between pressing Spin Down and Unraid trying to spin down the drives.

     

    Dec  3 20:51:09 Tower emhttp: shcmd (54): sync

    Dec  3 21:15:12 Tower emhttp: shcmd (55): /usr/sbin/hdparm -y /dev/sdg >/dev/null

    Dec  3 21:15:12 Tower emhttp: shcmd (56): /usr/sbin/hdparm -y /dev/sdc >/dev/null

    Dec  3 21:15:12 Tower emhttp: shcmd (57): /usr/sbin/hdparm -y /dev/sda >/dev/null

    Dec  3 21:15:13 Tower emhttp: shcmd (58): /usr/sbin/hdparm -y /dev/sdb >/dev/null

    Dec  3 21:15:13 Tower emhttp: shcmd (59): /usr/sbin/hdparm -y /dev/hdg >/dev/null

     

    There are 3 other examples in the log, when the simultaneous Preclear's were running, where pressing Spin Down immediately spun down the drives.

     

    4.5 Beta 11, md_num_stripes=5120, C2SEE, 4GB RAM, disks connected to the integrated ICH10.

     

    You basically experienced "resource contention"  (too much going on, too little memory for it all to happen at once)  One of the pre-clear processes was using some resource others needed, so they waited until it was free.

     

    You will benefit from the three parameters I added most recently that allow you to specify smaller block sizes when reading and writing and a smaller number of blocks as well.   Those parameters are:

     

          -w size  = write block size in bytes

     

           -r size  = read block size in bytes

     

           -b count = number of blocks to read at a time

    They are described in more detail in this post: http://lime-technology.com/forum/index.php?topic=2817.msg39972#msg39972

     

    Joe L.

     

    Thanks

  5. Have been burning in (2) ST32000542AS 5900RPM 2TB drives, by running 2 copies of Preclear simultaneously.  With version .9.8, each pass takes 30 hours!

     

    I manually spindown the array nightly from the standard web interface.  During the 4th pass, pressing the Spin Down button caused the web interface to become completely unresponsive.  Couldn't even launch the web interface from another PC.  UnMenu worked fine, and I could access files files via the disk shares or user shares.

     

    Gave up for the night.  In the morning, all was well.  The syslog showed a 24 minute delay between pressing Spin Down and Unraid trying to spin down the drives.

     

    Dec  3 20:51:09 Tower emhttp: shcmd (54): sync

    Dec  3 21:15:12 Tower emhttp: shcmd (55): /usr/sbin/hdparm -y /dev/sdg >/dev/null

    Dec  3 21:15:12 Tower emhttp: shcmd (56): /usr/sbin/hdparm -y /dev/sdc >/dev/null

    Dec  3 21:15:12 Tower emhttp: shcmd (57): /usr/sbin/hdparm -y /dev/sda >/dev/null

    Dec  3 21:15:13 Tower emhttp: shcmd (58): /usr/sbin/hdparm -y /dev/sdb >/dev/null

    Dec  3 21:15:13 Tower emhttp: shcmd (59): /usr/sbin/hdparm -y /dev/hdg >/dev/null

     

    There are 3 other examples in the log, when the simultaneous Preclear's were running, where pressing Spin Down immediately spun down the drives.

     

    4.5 Beta 11, md_num_stripes=5120, C2SEE, 4GB RAM, disks connected to the integrated ICH10.