woolooloo

December 30, 2018

First drive went well. It is missing a ton of files, but it is the new drive that was rebuilt. I'm starting the original drive now and hopefully am able to salvage additional files. It is all media backup so I can always re-rip, but it will be over 2TB of media if the old drive does not do any better, so quite a pain. Thanks for the tip.

December 29, 2018

Thank you, starting that now!

December 29, 2018

My array has been kind of neglected for a while. Anyways (I apologize for the long story about the perfect storm) I have been ugprading some drives, trying to consolidate some stuff from old 1.5TB drives onto new drives to get rid of them. Along the way I found one drive that was basically dead. I was able to get pretty much everything off of it onto another drive that had enough space, so it was still in the array while I finished consolidating the other drives, but it was empty.

Then I find another drive that is failing. Not as bad as the first, but a full 3TB drive that I do not have enough free room to move onto other drives. So I ordered a new 4TB drive and put it in to rebuild and expand the file system. Unfortunately the pretty much dead 1.5TB drive still in the array has all sorts of read errors while rebuilding. I do a check on the new drive and sure enough, even though UNRAID says it is full, Windows shows it only has 250GB of files and a spot check shows a lot of empty directories and corrupted files.

Knowing my array is pretty much bust with the multiple failed drives, I go ahead and nuke it and start up a new array with no parity, pulling all the old 1.5TB drives, just so I can access the new drive. I also put in the 3TB drive that was failing. I run reiserfsck --rebuild_tree /dev/md2 (in maintenance mode) on the new drive and it is slowly churning away, finding problems. Realizing it will take all night, I start a second remote session and fire reiserfsck --rebuild-tree off against the 3TB failing drive too.

Then this morning I find out that FUCKING Windows decided last night would be a fine time to restart for an update (although I am continually trying to disable that crap), and it kills off both of my remote sessions and kills the reiserfscks mid-process. This morning after a server reboot, both of those drives are showing as unformatted.

Is there any hope to recovering anything off of those drives?

November 7, 2018

I played around with this today. It is a gigabyte GA-MA780G-UD3H with 2 x16 slots (one running at x4) and 3 x1 slots. Two of the x1 slots are disabled if an x4 card is installed in the second x16 slot. Anyways, I had the x8 SASLP card in the x16 slot, and I tried the Syba x1 2 port SATA card in all the other slots (including the second x16 (x4) slot. In some slots, the Syba card's BIOS would show up first and hang. In other slots the SASLP would show up, but the Syba card never did.

I finally moved the SASLP into the second x16 slot (running x4 even though it is an x8 card) and put the Syba card in the first x16 slot. That boots up properly, the SASLP card shows up first, then the Syba card shows up and everything boots and I can finally access the cache drive.

Unfortunately in this config the x8 SASLP card is crippled a bit running x4, but that's what my previous card was so at least I'm not going backwards.

I looked through the BIOS and could not find anything that looked like it would change boot priorities or anything for the PCI-e slots so I'm out of ideas. I may get another 4TB drive and merge a couple of my older 1.5TB drives then put the cache on the SASLP and get rid of the Syba card altogether. But if anyone ever thinks of anything else, please let me know.

October 27, 2018

Some background, my 13 drive plus parity plus cache system had been running with 6 drives on the MB SATA ports, 8 drives on a Supermicro AOC-SASLP-MV8 PCI-e x4 controller, and the final drive on a 2 port PCI-e x1 SATA card.

Recently the SASLP went bad and I just swapped it out with its replacement AOC-SAS2LP-MV8 which is a PCI-E x8 variant.

When I booted it up the first time, the system hung up on the 2 port PCI-e x1 SATA card. It showed the bios screen from that card but never moved on. After reseating everything and trying again, same problem.

I swapped the 2 port card to a different PCI-e x1 slot and everything booted up and seemed to be fine. Then I noticed that my cache drive is not showing up - and that is the one that is on the 2 port card. I've moved the card around to different PCI-e x1 slots (3 of them) and even tried the other PCI-e x16 slot that the SASLP is not using. Depending on which slot it is in, either the 2 port card shows up before the SASLP during boot and the system hangs, or the card never shows up.

All this is happening before UNRAID starts, so I'm not sure if there is anything helpful in my log but I'm attaching it anyways. But if anyone has any ideas on how to sort of this apparent conflict between my new SAS2LP and my 2 port card, I would appreciate it.

syslog-new setup.txt

October 26, 2018

Replaced the SASLP controller with the current version of it and was able to get everything back up and running. Thanks for the help debugging everything!

October 18, 2018

It's happened multiple times now after reboot so I will look to identify which controller is attached to these drives and replace it. Thanks for the insight. I may need some help figuring out how to replace the controller while maintaining the integrity of the array - especially since one of the drives is dead so I can't just rebuild parity.

October 17, 2018

Ok sorry, got pulled into jury duty last week which turned my life upside down, finally digging my way out.

After starting a rebuild, the errors started up basically immediately, my syslog grew to 128mb of read errors by time I stopped it a couple min later. I've truncated it to post and removed all the repetitive read errors that were at the end.

After stopping the rebuild, unRAID says there are 6 drives missing. Disk 10 is actually the one that I have replaced. My icydocks hold 4 drives, but if memory serves, at least one of my SATA cards has 6 drives, could that be going bad?

syslog-2018-10-17 - truncated.txt

October 8, 2018

I'm on 5.0.6, I do not use my server that actively and have not gotten around to upgrading to 6. It's been working fine with minimal effort for a couple years. I recently noticed it had been like a year since I had done a parity check, so I kicked one off the other day. I came back to check the status today and one of my disks was disabled. It was an older 1.5TB drive and I had a spare one sitting around, so I went ahead and swapped it out.

Since then, I've been trying to get it to rebuild onto the new disk, but 1) the write count on the new drive never increments even though the % complete on the rebuild keeps going up and 2) after a while other drives start showing a massive number of errors and if I try to look at their contents, those disks show up as empty. Stopping the array then shows those drives as disabled. A reboot seems to clear it back up and starts another rebuild. If I stop the array and remove the new disk from the array, then start the array, it is showing all of the files on that disk through emulation, so everything still seems intact at this point. I just do not understand what is causing the rebuild to fail. I guess it could be a failing SATA controller or a failing IcyDock or other hardware component, but I'm not sure the best way to track it down and I don't want to nuke multiple disks through repeated attempts.

Anyone have thoughts on the best way to approach this? TIA

User Customizations · November 25, 2012

I think everything looks good except the "30 sectors are re-alocated at the end of the preclear". I know this is kind of what the preclear is supposed to do, but should I be worried? Most of the logs others have posted have 0 sectors. Is 30 troublesome? Should I worry? Should I run it again and see if it continues to increase?

BTW, this is a Seagate recertified drive that they sent me to replace a failed drive.

I ran it again and got this:

 32 sectors had been re-allocated before the start of the preclear.
32 sectors are re-allocated at the end of the preclear,
    the number of sectors re-allocated did not change.

I'm glad that the pre-clear didn't find any more sector to re-allocate, but I'm a little disturbed that the count went from 30 to 32 after the last pre-clear while the drive was sitting unused. I guess I'll go ahead and use the drive, but if anyone finds this really troubling, please let me know.

User Customizations · November 24, 2012

============================================================================
** Changed attributes in files: /tmp/smart_start_sdj  /tmp/smart_finish_sdj
                ATTRIBUTE   NEW_VAL OLD_VAL FAILURE_THRESHOLD STATUS      RAW_VALUE
      Raw_Read_Error_Rate =   119     100            6        ok          207253708
         Spin_Retry_Count =   100     100           97        near_thresh 0
         End-to-End_Error =   100     100           99        near_thresh 0
          High_Fly_Writes =    85     100            0        ok          15
  Airflow_Temperature_Cel =    66      69           45        near_thresh 34
      Temperature_Celsius =    34      31            0        ok          34
   Hardware_ECC_Recovered =    56     100            0        ok          207253708
No SMART attributes are FAILING_NOW

0 sectors were pending re-allocation before the start of the preclear.
0 sectors were pending re-allocation after pre-read in cycle 1 of 1.
0 sectors were pending re-allocation after zero of disk in cycle 1 of 1.
0 sectors are pending re-allocation at the end of the preclear,
    the number of sectors pending re-allocation did not change.
0 sectors had been re-allocated before the start of the preclear.
30 sectors are re-allocated at the end of the preclear,
    a change of 30 in the number of sectors re-allocated.

I think everything looks good except the "30 sectors are re-alocated at the end of the preclear". I know this is kind of what the preclear is supposed to do, but should I be worried? Most of the logs others have posted have 0 sectors. Is 30 troublesome? Should I worry? Should I run it again and see if it continues to increase?

BTW, this is a Seagate recertified drive that they sent me to replace a failed drive.

User Customizations · July 1, 2010

My client is just a few lines of test code in a C# application I am working on, it can do TCP just by replacing UdpClient with TcpClient and then changing c.Send to c.Client.Send.

            UdpClient c = new UdpClient("watchtower", 15490); // host, port
            Byte[] b = Encoding.ASCII.GetBytes("disk3\n");
            c.Send(b, b.Length);
            Thread.Sleep(250);
            b = Encoding.ASCII.GetBytes("disk7\n");
            c.Send(b, b.Length);

I thought Wikipedia said that inetd would spawn a new server for every TCP request though which didn't seem as efficient using a single server for UDP. Maybe I misunderstood.

User Customizations · July 1, 2010

Thanks for the tips, it's been 10+ years since I've done any C programming or Unix programming. A lot of it came back easily but I'm not surprised some details (like _exit(127)) have left the building.

Line for /etc/services:

mdspinup           15490/udp # mdspinup server, portnumber/udp

Line for /etc/inetd.conf:

mdspinup dgram udp wait root /boot/custom/mdspinup_inetd/mdspinup.inetd mdspinup.inetd

User Customizations · July 1, 2010

I haven't had time to play with this until last night. But I took Weebotech's advice and wrote a C program to receive remote network commands that can be plugged into inetd. I've attached the code below for anyone interested. Basically, it allows you to open a UDP connection and send commands like "disk3\n" to the port configured in inetd.conf and it will spin up that disk by calling mdcmd. The only valid commands are "disk1\n" ... to ... "disk19\n" and "exit\n", it will just ignore anything else it receives.

I've had to implement a slight pause (250ms, although shorter may work as well) between commands in my client or else it only responds to the last one, I think maybe because I am using UDP (per Wikipedia to use a single server instance). It may also work if you use TCP (perhaps without a pause), but I haven't tested it.

Anyways, for posterity, here is mdcmd_inetd.c:

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <syslog.h>
#include <string.h>

int main(int argc, char **argv)
{
  char p[4096];
  openlog("mdcmd.inetd", LOG_PID|LOG_CONS, LOG_USER);
  //inetd passes its information to us in stdin.
  while(fgets(p, sizeof(p), stdin))
  {
    // Remove the newline from the string
    p[strlen(p)-1] = 0;
    if (strcmp(p, "exit") == 0)
    {
      syslog(LOG_INFO, "Received exit request");
      break;
    }
    if ((strlen(p) < 5) || (strlen(p) > 6))
    {
      continue;
    }

    if (strncmp(p, "disk", 4) != 0)
    {
      continue;
    }

    int disk = atoi(p+4);
    if ((disk < 0) || (disk > 19))
    {
      continue;
    }

    // Fork and execute the command to spin up the disk.
    if (!fork())
    {
      // Child
      char buf[256];
      sprintf(buf, "Spinning up %s", p);
      syslog(LOG_INFO, buf);
      closelog();
      execl("/root/mdcmd", "/root/mdcmd", "spinup", p+4, (char *)0);
      _exit(0); // Shouldn't be needed.
    }
  }

  closelog();
  exit(0);
}

User Customizations · February 17, 2010

Quick -n- dirty way of exposing the mdcmd via tcpip.

Cool, thanks for that, I'll play around with it and see how it works.

User Customizations · February 17, 2010

I'll take a look at my script and update it with the new interface.

It might be quicker to expose the mdcmd via inetd or an unmenu plugin and let the client program have the intelligence to decide what to do.

Thanks. Also thanks for pointing out inetd, that's a handy tool I'll take a closer look at.

User Customizations · February 17, 2010

The commands are in the mdcmd interface to the "md" driver

They are

...

Thanks for all the info, a lot to digest for a windows programmer. I'll try to plow through it this weekend. WeeboTech, if you think you will be updating SPINCONTROL, please let me know and I may just wait for you to do your magic before jumping in myself.

Actually taking a second, closer look at it, it's a lot more straightforward than I thought at first glance. Just need to fashion a TCP/IP interface to those commands.

User Customizations · February 17, 2010

The commands are in the mdcmd interface to the "md" driver

They are

...

Thanks for all the info, a lot to digest for a windows programmer. I'll try to plow through it this weekend. WeeboTech, if you think you will be updating SPINCONTROL, please let me know and I may just wait for you to do your magic before jumping in myself.

User Customizations · February 17, 2010

The new spin-up logic will immediately spin down a drive if its timer has expired and you do not either use one of the "spinup" commands or access the disk through the /dev/md device.

Joe, I saw one of your other posts in the 4.5 announcement thread where you talk about the "spinup" commands that Tom provided, but I can't find any reference to those commands in the release notes or elsewhere. Can you give me a pointer? Can the commands be executed remotely, maybe via HTTP? I just need a way to send a spin-up command (preferably with granularity, but I'd settle for spin-up all) remotely from a windows computer. WeeboTech's tool will work if he wants to update it to not use hdparm, but I'm open to other ideas as well. Thanks.

User Customizations · February 17, 2010

Ah, so when you say the web page could be out of sync, could the drives be spun up and it just doesn't show? It does a lot of clicking like it is spinning them up, so unless it is spinning them back down immediately (which I guess could be the case), maybe they are spun up. Is there another way besides the web page that I could check their status? I have unmenu installed, but it seems to also indicate they are spun down (there is no temperature shown for the drives).

Regardless, thanks for the tool and the help.

User Customizations · February 16, 2010

Long time since this thread has been active, but I'm looking for this sort of functionality and just gave it a try. Everything seems to install and run properly, but the drives don't seem to spin up.

root@WatchTower:~# spincontrol -U -a

Spin up on /dev/sdk ata-SAMSUNG_HD103UJ_S13PJ1NPA00122\c

/dev/sdk:
setting standby to 242 (1 hours)

Spin up on /dev/sdm ata-SAMSUNG_HD154UI_S1XWJ1KS807855\c

/dev/sdm:
setting standby to 242 (1 hours)

Spin up on /dev/sdn ata-SAMSUNG_HD154UI_S1XWJ1KS807856\c

/dev/sdn:
setting standby to 242 (1 hours)

Spin up on /dev/sde ata-SAMSUNG_HD501LJ_S0MUJ13P325950\c

/dev/sde:
setting standby to 242 (1 hours)

That's some of the output (it continues for the rest of my drives). I can hear the array in my closet making clicking noises like it is accessing drives, but when I check the unRaid webpage, it shows the drives as spun down (except the cache drive for some reason).

Is it possible something has changed in the new versions of unRaid (I'm on 4.5) that spins the drives back down immediately after receiving these commands or something?

I've also tried the telnet version include fup (is fup available on the command line), with the same results.

woolooloo

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by woolooloo

reiserfsck failed, disk unformatted

reiserfsck failed, disk unformatted

reiserfsck failed, disk unformatted

2 port SATA card not working

2 port SATA card not working

Disk disabled - rebuild failing

Disk disabled - rebuild failing

Disk disabled - rebuild failing

Disk disabled - rebuild failing

Preclear.sh results - Questions about your results? Post them here.

Preclear.sh results - Questions about your results? Post them here.

SPINCONTROL: Local and Remote Management of Drive Standby State.

SPINCONTROL: Local and Remote Management of Drive Standby State.

SPINCONTROL: Local and Remote Management of Drive Standby State.

SPINCONTROL: Local and Remote Management of Drive Standby State.

SPINCONTROL: Local and Remote Management of Drive Standby State.

SPINCONTROL: Local and Remote Management of Drive Standby State.

SPINCONTROL: Local and Remote Management of Drive Standby State.

SPINCONTROL: Local and Remote Management of Drive Standby State.

SPINCONTROL: Local and Remote Management of Drive Standby State.

SPINCONTROL: Local and Remote Management of Drive Standby State.