calypsocowboy

Members
  • Posts

    31
  • Joined

  • Last visited

Posts posted by calypsocowboy

  1. So i think I'm going to go with option 2 and do the new config.  I'll then mount disk3 up to another machine and see what data I can get off it and copy it back to the server.

     

    I believe these are correct procedures https://wiki.lime-technology.com/UnRAID_6_2/Storage_Management#Reset_the_array_configuration 

     

    One additional question, I'm assuming that resetting the array rebuilds the parity drive. So I'm guessing it makes sense to wait to do this until I get my new larger parity drive. Is that correct?

     

    Once that all completes, then I'll preclear my old parity drive and add it back in. 

     

    BTW, thanks for the help.  I definitely need to get notifications setup on the server. 

  2. Okay, at this point, I've copied off most of what I think my (non-replaceable) files off the array. I haven't checked all the files to see if they are okay. At this point on the array is mostly music and movies about 8.8TB worth that I can replace but would prefer not to. I've shut down the array to prevent further writes to it. 

     

    My current array is 4TB parity, 4TB, 3x2TB data drives. As it sits right now, I'm using 8.8TB, so I don't have room to pull the failing 2TB drive out now. In a week, I'll have a new motherboard and/or a new controller card and a new 8TB drive I was planning on for parity. What's the best way to bring things back up as it sounds like at this point, it sounds like trusting parity to rebuild the drive wouldn't be a good idea. 

     

    My initial thought was some what around clearing my 4TB parity drive, bringing up all 4 other discs, copying the data from the failing drive over to a good one and removing it from the array. All of this would I'm assuming be done with the array unprotected. Then once the failing 2TB drive is out and the 4TB drive is in, Put the 8TB drive and rebuild parity. Lastly, hope that I didn't lose too much data. 

  3. I have a drive showing with a red X, device is disabled.  About my system, I'm running Unraid 6.2 on an older Supermicro PDSMi board with a RR1U-ELi riser card with a SuperMicro AOC-SASLP-MV8 card. The drive that is having problems is connected to that card. It's a 4TB Seagate drive. I got it from a schucked enclosure a number of years back from Costco. I have two of those drives in the computer. I believe my mobo bios is up to date, I'm not sure what bios the card is running. 

     

    This first happened about 3 months ago. At the time, I stopped the array, powered down the server, pulled and reset the power and data cables, restarted the server, checked the smart report, it came back clean on the drive, so went through the process of adding the drive back into the array, Things seemed to be working well for a bit.

     

    About a week ago, I noticed the same thing, I took the same steps only this time, I connected the drive to a different end of the breakout cable, the one I had it connected to looked a little suspect, same process, clean smart report, added drive back in. 

     

    And now, back comes the red x, same drive. 

     

    No now trying to figure out what's next. I'm not sure the drive is bad because after each reboot, it come back clean. I've tried different ends on the breakout cable (Monoprice). I could try ordering a new cable to see if that might be the issue. I'm not sure if it's the expansion card or maybe the riser or the combo. I don't think my mobo supports just the card without the riser. 

     

    Any thoughts or am I too the point where I need to look for a new expansion card, or maybe a mobo that supports more SATA connections, or supports the card I have directly instead of via riser. 

     

    Current Diagnostics - cascade-diagnostics-20170527-1408.zip

    Last Weeks Diagnostics - cascade-diagnostics-20170521-0812.zip

  4. I'm upgrading from rc12 I think to 5 final. The upgrade seemed to go fine, installed the key rebooted and for the most part I can get to the console okay on reboot.

     

    I can get to the gui fine without problems for a bit, then all of sudden, nothing. I can't get to via server name or IP. The server is still up and functioning, serving files and I can get to it via telnet, it's just the console appears to be dead, I'm seeing a lot of these errors in my syslog.

     

    Sep  7 17:27:24 CASCADE avahi-daemon[1112]: Invalid response packet from host 192.168.1.101.

    Sep  7 17:29:04 CASCADE avahi-daemon[1112]: Invalid response packet from host 192.168.1.101.

    Sep  7 17:30:44 CASCADE avahi-daemon[1112]: Invalid response packet from host 192.168.1.101.

    Sep  7 17:32:23 CASCADE avahi-daemon[1112]: Invalid response packet from host 192.168.1.101.

    Sep  7 17:34:03 CASCADE avahi-daemon[1112]: Invalid response packet from host 192.168.1.101.

     

    This is pretty much a fresh install, no plugins and the stock 5.0 web gui.

     

    SMB setting are as follows:

     

    Enable SMB: Yes (Workgroup)

    Done

     

    Workgroup:  BAWDENHOME

    Local master: No Yes 

     

     

    -Josh

  5. I was having the same errors on my SuperMicro board. I honestly don't know what fixed it. I know know Tom changed a timeout value in a later release and that helped, but it seemed to be more kernel related as the newer kernel worked.

     

    My last Bios update for my board was over 5 years ago, so I'm tempted to think it's mobo related. These new boards are getting faster and faster and new OS's, sometimes I think they just don't work as well on the older hardware.

     

    I also got a new flashdrive and did notice it was faster.

     

    Sorry, I'm not more help.

  6. I'm currently using a Lexar Firefly, that is only a couple of years old. I've checkdisked it and it comes back fine.

     

    I also tried another newer USB drive I have laying around. Same thing, system behaved the same way.  Also if the flash drive was going bad, I would not have expected rc13 to work. 

     

    It seems to be related some how to this http://lime-technology.com/forum/index.php?topic=25250.msg220900#msg220900 For what ever reason the USB drive isn't mounting and it's only waiting soo long.

     

    I do see a lot of ata5: link is slow to respond errors on the concole, but I thought those were due to ata ports on the mobo that are not in use.

     

    Jun 14 20:50:43 Tower kernel: sd 1:0:0:0: [sdb] Attached SCSI disk

    Jun 14 20:50:43 Tower kernel: scsi 8:0:0:0: Direct-Access    Lexar    JD FireFly      1100 PQ: 0 ANSI: 0 CCS

    Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: [sdd] 3915776 512-byte logical blocks: (2.00 GB/1.86 GiB)

    Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: [sdd] Write Protect is off

    Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: [sdd] Mode Sense: 43 00 00 00

    Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: [sdd] No Caching mode page present

    Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: [sdd] Assuming drive cache: write through

    Jun 14 20:50:43 Tower kernel: sd 0:0:0:0: Attached scsi generic sg0 type 0

    Jun 14 20:50:43 Tower kernel: sd 1:0:0:0: Attached scsi generic sg1 type 0

    Jun 14 20:50:43 Tower kernel: sd 2:0:0:0: Attached scsi generic sg2 type 0

    Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: Attached scsi generic sg3 type 0

    Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: [sdd] No Caching mode page present

    Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: [sdd] Assuming drive cache: write through

    Jun 14 20:50:43 Tower kernel:  sdd: sdd1

    Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: [sdd] No Caching mode page present

    Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: [sdd] Assuming drive cache: write through

    Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: [sdd] Attached SCSI removable disk

    Jun 14 20:50:43 Tower kernel: ata5: link is slow to respond, please be patient (ready=-19)

    Jun 14 20:50:43 Tower kernel: ata5: COMRESET failed (errno=-16)

    Jun 14 20:50:43 Tower kernel: ata5: link is slow to respond, please be patient (ready=-19)

    Jun 14 20:50:43 Tower kernel: ata5: COMRESET failed (errno=-16)

    Jun 14 20:50:43 Tower kernel: ata5: link is slow to respond, please be patient (ready=-19)

    Jun 14 20:50:43 Tower kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

  7. At the console, I'm seeing a number of "waiting for /dev/disk/by-label/UNRAID" there are about 10 or messages then /dev/disk/by-label/UNRAID not found. Mounting non-root local file systems.

     

    if -rc15 is coming out with the 3.9.6 kernel, I'll just see how that goes as -rc13 worked.

     

    -Josh

  8. Can someone help me understand when in the process the USB drive gets mounted as /boot or as UNRAID and is there a timing of how long it waits for the drive to be loaded before moving on?

     

    I'm trying to troubleshoot an error. rc10 worked, 11-12 no workie, 13 worked, 14 no workie. It appears to be related to the USB drive not getting mounted or being mounted in properly. The only thing I can someone trace it to is the kernel.

     

    From my limited unix knowledge it appears the drive is mounted on the computer, but something is going on. The system comes up and emhttp doesn't appear or want to load. I can start it manually, but it has problems.

     

    root@Tower:/# ls /dev/disk/by-id/*

    /dev/disk/by-id/ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0310325@        /dev/disk/by-id/scsi-SATA_WDC_WD20EARS-00_WD-WMAZA2802606@

    /dev/disk/by-id/ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0310325-part1@  /dev/disk/by-id/scsi-SATA_WDC_WD20EARS-00_WD-WMAZA2802606-part1@

    /dev/disk/by-id/ata-WDC_WD20EARS-00MVWB0_WD-WMAZA2802606@        /dev/disk/by-id/usb-Lexar_JD_FireFly_TXVSZS46RZ0JRC7V5WG1-0:0@

    /dev/disk/by-id/ata-WDC_WD20EARS-00MVWB0_WD-WMAZA2802606-part1@  /dev/disk/by-id/usb-Lexar_JD_FireFly_TXVSZS46RZ0JRC7V5WG1-0:0-part1@

    /dev/disk/by-id/ata-WDC_WD20EARS-00S8B1_WD-WCAVY2836928@          /dev/disk/by-id/wwn-0x50014ee00261e6e5@

    /dev/disk/by-id/ata-WDC_WD20EARS-00S8B1_WD-WCAVY2836928-part1@    /dev/disk/by-id/wwn-0x50014ee00261e6e5-part1@

    /dev/disk/by-id/scsi-SATA_WDC_WD20EARS-00_WD-WCAVY2836928@        /dev/disk/by-id/wwn-0x50014ee2041dd087@

    /dev/disk/by-id/scsi-SATA_WDC_WD20EARS-00_WD-WCAVY2836928-part1@  /dev/disk/by-id/wwn-0x50014ee2041dd087-part1@

    /dev/disk/by-id/scsi-SATA_WDC_WD20EARS-00_WD-WMAZA0310325@        /dev/disk/by-id/wwn-0x50014ee65608294e@

    /dev/disk/by-id/scsi-SATA_WDC_WD20EARS-00_WD-WMAZA0310325-part1@  /dev/disk/by-id/wwn-0x50014ee65608294e-part1@

    root@Tower:/# ls /dev/disk/by-label/*

    /dev/disk/by-label/UNRAID@

     

    Should I try unmounting /boot and remounting in manually and starting emhttp?

     

    Thanks,

    Josh

    syslog14a.txt

  9. I don't know what it is about that 3.4 kernel but my mobo doesn't like it. It never seems to find my USB drive.  Mine doesn't even come up with rc14. I went back to 13 and it was fine. I've rebooted a few times with 13 and no issues with stopping the array.

     

    I feel for you Tom, trying to find something that works for everyone.

    syslog14.txt

  10. It's an older system. I'm using the most current Bios, but it's still from 2008.

     

    SuperMicro PDSMi http://www.supermicro.com/products/motherboard/pd/E7230/pdsmi.cfm

    - Intel® E7230 (Mukilteo) Chipset

    - 1x Intel® 82573L PCI-e Gigabit LAN

    - 1x Intel® 82573V PCI-e Gigabit LAN

    - SATA ICH7R Controller

    - ATI RageXL Graphics

     

    Intel Pentium D 2.8

    4GB Memory

    Lexar Firefly USB Drive

    WD Green Drives

    No addin cards installed at this time.

  11. I went back through RC versions, It looks like I can get 10 to come up and the array starts successfully. 10 has come up on me twice, and twice it has hung with a "Waiting for USB Subsystem" message on the screen for more than 5 minutes. The two times it's come up I've seen the same error on the screen, nothing in the log, but the system did eventually come up. I've attached the syslog from 10. Right now I'm waiting for the permission utility to do it's thing before doing too much.

     

    So I know 10 is on kernel 3.4.24 and 11a is 3.4.26, is my problem likely a kernel error or a unRaid error?

     

    -Josh

    syslog10_running2.txt

  12. Yes, I did read through the Wiki article but didn't see anything that stood out relating to this.  I'm using the stock go file which matches the wiki page. On the release page, in the  past I've tried just copying over the bzimage and bzroot files, that didn't work. So I've set the drive up as I mentioned above, reformated with freshly downloaded 12a files (checksum matched) with my Shares folder, disk.cfg, ident.cfg, network.cfg, share.cfg, and super.dat.

     

    At this point the server isn't "coming up all the way" as emhttp isn't running, it isn't getting to the go script. So the array is showing "Stopped: no devices". I can see my three disks, just not my flash drive which is where are the config is.

     

    -Josh

     

  13. Still seems to be related to mounting the flash drive and or drives. I've tried different drives and had the same results. So I don't think it's my flash drive. I've also tried different usb slots on the MB, same result. I was able to get emhttp manually started and got the web server up. It shows a flash drive but no info. I was able to get another syslog after I started emhttp.

     

    I tried 11a and got similar errors, but I only tried it briefly.

    12astartup.PNG.90ee7e056cd6f36573508f7867a65ad4.PNG

    syslog05242013_512a_w_emhttp.txt

  14. Update: I've reformatted my flashdrive and put the 12a files on it and just the following files from my old install Shares folder, disk.cfg, ident.cfg, network.cfg, share.cfg, super.dat.  I left the go from 12a and I didn't put a secrets.tbd file.

     

    Still no go, but I was able to telnet into the server and with putty get some screenshots of the syslog. I'm seeing the following which may be were it is stopping.

     

    May 25 20:55:07 Tower udevd[670]: worker [688] unexpectedly returned with status 0x0100

    May 25 20:55:07 Tower udevd[670]: worker [688] failed while handling '/devices/pci0000:00/0000:00:1f.2'

    May 25 20:55:07 Tower udevd[670]: worker [708] unexpectedly returned with status 0x0100

    May 25 20:55:07 Tower udevd[670]: worker [708] failed while handling '/devices/pci0000:00/0000:00:1c.5/0000:0e:00.0'

    May 25 20:55:07 Tower udevd[670]: worker [701] unexpectedly returned with status 0x0100

    May 25 20:55:07 Tower udevd[670]: worker [701] failed while handling '/devices/pci0000:00/0000:00:1e.0/0000:0f:02.0'

    May 25 20:55:08 Tower udevd[670]: worker [803] unexpectedly returned with status 0x0100

    May 25 20:55:08 Tower udevd[670]: worker [803] failed while handling '/devices/pci0000:00/0000:00:1d.7/usb1/1-2/1-2:1.0/host8/target8:0:0/8:0:0:0'

     

    I've heard about these being plugin related, but with fresh 12a files and only the files listed above, I don't have any plugin's installed. I've attached the syslog.

     

    PS. I copied over my 4.7 files and the server boots right up.

     

    -Josh

    syslog05242013_512a.txt