Jump to content

woolooloo

Members
  • Posts

    144
  • Joined

  • Last visited

Posts posted by woolooloo

  1. My array has been kind of neglected for a while. Anyways (I apologize for the long story about the perfect storm) I have been ugprading some drives, trying to consolidate some stuff from old 1.5TB drives onto new drives to get rid of them. Along the way I found one drive that was basically dead. I was able to get pretty much everything off of it onto another drive that had enough space, so it was still in the array while I finished consolidating the other drives, but it was empty.

     

    Then I find another drive that is failing. Not as bad as the first, but a full 3TB drive that I do not have enough free room to move onto other drives. So I ordered a new 4TB drive and put it in to rebuild and expand the file system. Unfortunately the pretty much dead 1.5TB drive still in the array has all sorts of read errors while rebuilding. I do a check on the new drive and sure enough, even though UNRAID says it is full, Windows shows it only has 250GB of files and a spot check shows a lot of empty directories and corrupted files.

     

    Knowing my array is pretty much bust with the multiple failed drives, I go ahead and nuke it and start up a new array with no parity, pulling all the old 1.5TB drives, just so I can access the new drive. I also put in the 3TB drive that was failing. I run reiserfsck --rebuild_tree /dev/md2 (in maintenance mode) on the new drive and it is slowly churning away, finding problems. Realizing it will take all night, I start a second remote session and fire reiserfsck --rebuild-tree off against the 3TB failing drive too.

     

    Then this morning I find out that FUCKING Windows decided last night would be a fine time to restart for an update (although I am continually trying to disable that crap), and it kills off both of my remote sessions and kills the reiserfscks mid-process. This morning after a server reboot, both of those drives are showing as unformatted.

     

    Is there any hope to recovering anything off of those drives?

  2. I played around with this today. It is a gigabyte GA-MA780G-UD3H with 2 x16 slots (one running at x4) and 3 x1 slots. Two of the x1 slots are disabled if an x4 card is installed in the second x16 slot. Anyways, I had the x8 SASLP card in the x16 slot, and I tried the Syba x1 2 port SATA card in all the other slots (including the second x16 (x4) slot. In some slots, the Syba card's BIOS would show up first and hang. In other slots the SASLP would show up, but the Syba card never did.

     

    I finally moved the SASLP into the second x16 slot (running x4 even though it is an x8 card) and put the Syba card in the first x16 slot. That boots up properly, the SASLP card shows up first, then the Syba card shows up and everything boots and I can finally access the cache drive.

     

    Unfortunately in this config the x8 SASLP card is crippled a bit running x4, but that's what my previous card was so at least I'm not going backwards.


    I looked through the BIOS and could not find anything that looked like it would change boot priorities or anything for the PCI-e slots so I'm out of ideas. I may get another 4TB drive and merge a couple of my older 1.5TB drives then put the cache on the SASLP and get rid of the Syba card altogether. But if anyone ever thinks of anything else, please let me know.

  3. Some background, my 13 drive plus parity plus cache system had been running with 6 drives on the MB SATA ports, 8 drives on a Supermicro AOC-SASLP-MV8 PCI-e x4 controller, and the final drive on a 2 port PCI-e x1 SATA card.

     

    Recently the SASLP went bad and I just swapped it out with its replacement AOC-SAS2LP-MV8 which is a PCI-E x8 variant.

     

    When I booted it up the first time, the system hung up on the 2 port PCI-e x1 SATA card. It showed the bios screen from that card but never moved on. After reseating everything and trying again, same problem.

     

    I swapped the 2 port card to a different PCI-e x1 slot and everything booted up and seemed to be fine. Then I noticed that my cache drive is not showing up - and that is the one that is on the 2 port card. I've moved the card around to different PCI-e x1 slots (3 of them) and even tried the other PCI-e x16 slot that the SASLP is not using. Depending on which slot it is in, either the 2 port card shows up before the SASLP during boot and the system hangs, or the card never shows up.

     

    All this is happening before UNRAID starts, so I'm not sure if there is anything helpful in my log but I'm attaching it anyways. But if anyone has any ideas on how to sort of this apparent conflict between my new SAS2LP and my 2 port card, I would appreciate it.

    syslog-new setup.txt

  4. It's happened multiple times now after reboot so I will look to identify which controller is attached to these drives and replace it. Thanks for the insight. I may need some help figuring out how to replace the controller while maintaining the integrity of the array - especially since one of the drives is dead so I can't just rebuild parity.

  5. Ok sorry, got pulled into jury duty last week which turned my life upside down, finally digging my way out.

     

    After starting a rebuild, the errors started up basically immediately, my syslog grew to 128mb of read errors by time I stopped it a couple min later. I've truncated it to post and removed all the repetitive read errors that were at the end.

     

    After stopping the rebuild, unRAID says there are 6 drives missing. Disk 10 is actually the one that I have replaced. My icydocks hold 4 drives, but if memory serves, at least one of my SATA cards has 6 drives, could that be going bad?

     

    1051278991_Arrayafterrebuildattempt.thumb.png.5e01ccd5e3f9ecf26523f2e99a77b1b2.png

    syslog-2018-10-17 - truncated.txt

  6. I'm on 5.0.6, I do not use my server that actively and have not gotten around to upgrading to 6. It's been working fine with minimal effort for a couple years. I recently noticed it had been like a year since I had done a parity check, so I kicked one off the other day. I came back to check the status today and one of my disks was disabled. It was an older 1.5TB drive and I had a spare one sitting around, so I went ahead and swapped it out.

     

    Since then, I've been trying to get it to rebuild onto the new disk, but 1) the write count on the new drive never increments even though the % complete on the rebuild keeps going up and 2) after a while other drives start showing a massive number of errors and if I try to look at their contents, those disks show up as empty. Stopping the array then shows those drives as disabled. A reboot seems to clear it back up and starts another rebuild. If I stop the array and remove the new disk from the array, then start the array, it is showing all of the files on that disk through emulation, so everything still seems intact at this point. I just do not understand what is causing the rebuild to fail. I guess it could be a failing SATA controller or a failing IcyDock or other hardware component, but I'm not sure the best way to track it down and I don't want to nuke multiple disks through repeated attempts.

     

    Anyone have thoughts on the best way to approach this? TIA

  7. I think everything looks good except the "30 sectors are re-alocated at the end of the preclear". I know this is kind of what the preclear is supposed to do, but should I be worried? Most of the logs others have posted have 0 sectors. Is 30 troublesome? Should I worry? Should I run it again and see if it continues to increase?

     

    BTW, this is a Seagate recertified drive that they sent me to replace a failed drive.

     

    I ran it again and got this:

     

     32 sectors had been re-allocated before the start of the preclear.
    32 sectors are re-allocated at the end of the preclear,
        the number of sectors re-allocated did not change.
    

     

    I'm glad that the pre-clear didn't find any more sector to re-allocate, but I'm a little disturbed that the count went from 30 to 32 after the last pre-clear while the drive was sitting unused. I guess I'll go ahead and use the drive, but if anyone finds this really troubling, please let me know.

  8. ============================================================================
    ** Changed attributes in files: /tmp/smart_start_sdj  /tmp/smart_finish_sdj
                    ATTRIBUTE   NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS      RAW_VALUE
          Raw_Read_Error_Rate =   119     100            6        ok          207253708
             Spin_Retry_Count =   100     100           97        near_thresh 0
             End-to-End_Error =   100     100           99        near_thresh 0
              High_Fly_Writes =    85     100            0        ok          15
      Airflow_Temperature_Cel =    66      69           45        near_thresh 34
          Temperature_Celsius =    34      31            0        ok          34
       Hardware_ECC_Recovered =    56     100            0        ok          207253708
    No SMART attributes are FAILING_NOW
    
    0 sectors were pending re-allocation before the start of the preclear.
    0 sectors were pending re-allocation after pre-read in cycle 1 of 1.
    0 sectors were pending re-allocation after zero of disk in cycle 1 of 1.
    0 sectors are pending re-allocation at the end of the preclear,
        the number of sectors pending re-allocation did not change.
    0 sectors had been re-allocated before the start of the preclear.
    30 sectors are re-allocated at the end of the preclear,
        a change of 30 in the number of sectors re-allocated.
    

     

    I think everything looks good except the "30 sectors are re-alocated at the end of the preclear". I know this is kind of what the preclear is supposed to do, but should I be worried? Most of the logs others have posted have 0 sectors. Is 30 troublesome? Should I worry? Should I run it again and see if it continues to increase?

     

    BTW, this is a Seagate recertified drive that they sent me to replace a failed drive.

  9. My client is just a few lines of test code in a C# application I am working on, it can do TCP just by replacing UdpClient with TcpClient and then changing c.Send to c.Client.Send.

     

                UdpClient c = new UdpClient("watchtower", 15490); // host, port
                Byte[] b = Encoding.ASCII.GetBytes("disk3\n");
                c.Send(b, b.Length);
                Thread.Sleep(250);
                b = Encoding.ASCII.GetBytes("disk7\n");
                c.Send(b, b.Length);
    

     

    I thought Wikipedia said that inetd would spawn a new server for every TCP request though which didn't seem as efficient using a single server for UDP. Maybe I misunderstood.

  10. Thanks for the tips, it's been 10+ years since I've done any C programming or Unix programming. A lot of it came back easily but I'm not surprised some details (like _exit(127)) have left the building.

     

    Line for /etc/services:

    mdspinup           15490/udp # mdspinup server, portnumber/udp
    

     

    Line for /etc/inetd.conf:

    mdspinup dgram udp wait root /boot/custom/mdspinup_inetd/mdspinup.inetd mdspinup.inetd
    

  11. I haven't had time to play with this until last night. But I took Weebotech's advice and wrote a C program to receive remote network commands that can be plugged into inetd. I've attached the code below for anyone interested. Basically, it allows you to open a UDP connection and send commands like "disk3\n" to the port configured in inetd.conf and it will spin up that disk by calling mdcmd. The only valid commands are "disk1\n" ... to ... "disk19\n" and "exit\n", it will just ignore anything else it receives.

     

    I've had to implement a slight pause (250ms, although shorter may work as well) between commands in my client or else it only responds to the last one, I think maybe because I am using UDP (per Wikipedia to use a single server instance). It may also work if you use TCP (perhaps without a pause), but I haven't tested it.

     

    Anyways, for posterity, here is mdcmd_inetd.c:

     

    #include <stdio.h>
    #include <unistd.h>
    #include <stdlib.h>
    #include <syslog.h>
    #include <string.h>
    
    int main(int argc, char **argv)
    {
      char p[4096];
      openlog("mdcmd.inetd", LOG_PID|LOG_CONS, LOG_USER);
      //inetd passes its information to us in stdin.
      while(fgets(p, sizeof(p), stdin))
      {
        // Remove the newline from the string
        p[strlen(p)-1] = 0;
        if (strcmp(p, "exit") == 0)
        {
          syslog(LOG_INFO, "Received exit request");
          break;
        }
        if ((strlen(p) < 5) || (strlen(p) > 6))
        {
          continue;
        }
    
        if (strncmp(p, "disk", 4) != 0)
        {
          continue;
        }
    
        int disk = atoi(p+4);
        if ((disk < 0) || (disk > 19))
        {
          continue;
        }
    
        // Fork and execute the command to spin up the disk.
        if (!fork())
        {
          // Child
          char buf[256];
          sprintf(buf, "Spinning up %s", p);
          syslog(LOG_INFO, buf);
          closelog();
          execl("/root/mdcmd", "/root/mdcmd", "spinup", p+4, (char *)0);
          _exit(0); // Shouldn't be needed.
        }
      }
    
      closelog();
      exit(0);
    }
    

  12. The commands are in the mdcmd interface to the "md" driver

     

    They are

    ...

     

    Thanks for all the info, a lot to digest for a windows programmer.  I'll try to plow through it this weekend. WeeboTech, if you think you will be updating SPINCONTROL, please let me know and I may just wait for you to do your magic before jumping in myself.

     

    Actually taking a second, closer look at it, it's a lot more straightforward than I thought at first glance. Just need to fashion a TCP/IP interface to those commands.

  13. The new spin-up logic will immediately spin down a drive if its timer has expired and you do not either use one of the "spinup" commands or access the disk through the /dev/md device.

     

    Joe, I saw one of your other posts in the 4.5 announcement thread where you talk about the "spinup" commands that Tom provided, but I can't find any reference to those commands in the release notes or elsewhere. Can you give me a pointer?  Can the commands be executed remotely, maybe via HTTP?  I just need a way to send a spin-up command (preferably with granularity, but I'd settle for spin-up all) remotely from a windows computer. WeeboTech's tool will work if he wants to update it to not use hdparm, but I'm open to other ideas as well. Thanks.

  14. Ah, so when you say the web page could be out of sync, could the drives be spun up and it just doesn't show?  It does a lot of clicking like it is spinning them up, so unless it is spinning them back down immediately (which I guess could be the case), maybe they are spun up.  Is there another way besides the web page that I could check their status? I have unmenu installed, but it seems to also indicate they are spun down (there is no temperature shown for the drives).

     

    Regardless, thanks for the tool and the help.

  15. Long time since this thread has been active, but I'm looking for this sort of functionality and just gave it a try.  Everything seems to install and run properly, but the drives don't seem to spin up.

     

    root@WatchTower:~# spincontrol -U -a
    
    Spin up on /dev/sdk ata-SAMSUNG_HD103UJ_S13PJ1NPA00122\c
    
    /dev/sdk:
    setting standby to 242 (1 hours)
    
    Spin up on /dev/sdm ata-SAMSUNG_HD154UI_S1XWJ1KS807855\c
    
    /dev/sdm:
    setting standby to 242 (1 hours)
    
    Spin up on /dev/sdn ata-SAMSUNG_HD154UI_S1XWJ1KS807856\c
    
    /dev/sdn:
    setting standby to 242 (1 hours)
    
    Spin up on /dev/sde ata-SAMSUNG_HD501LJ_S0MUJ13P325950\c
    
    /dev/sde:
    setting standby to 242 (1 hours)
    
    

     

    That's some of the output (it continues for the rest of my drives).  I can hear the array in my closet making clicking noises like it is accessing drives, but when I check the unRaid webpage, it shows the drives as spun down (except the cache drive for some reason).

     

    Is it possible something has changed in the new versions of unRaid (I'm on 4.5) that spins the drives back down immediately after receiving these commands or something?

     

    I've also tried the telnet version include fup (is fup available on the command line), with the same results.

×
×
  • Create New...