calypsocowboy

Members
  • Posts

    31
  • Joined

  • Last visited

Everything posted by calypsocowboy

  1. So i think I'm going to go with option 2 and do the new config. I'll then mount disk3 up to another machine and see what data I can get off it and copy it back to the server. I believe these are correct procedures https://wiki.lime-technology.com/UnRAID_6_2/Storage_Management#Reset_the_array_configuration One additional question, I'm assuming that resetting the array rebuilds the parity drive. So I'm guessing it makes sense to wait to do this until I get my new larger parity drive. Is that correct? Once that all completes, then I'll preclear my old parity drive and add it back in. BTW, thanks for the help. I definitely need to get notifications setup on the server.
  2. Okay, at this point, I've copied off most of what I think my (non-replaceable) files off the array. I haven't checked all the files to see if they are okay. At this point on the array is mostly music and movies about 8.8TB worth that I can replace but would prefer not to. I've shut down the array to prevent further writes to it. My current array is 4TB parity, 4TB, 3x2TB data drives. As it sits right now, I'm using 8.8TB, so I don't have room to pull the failing 2TB drive out now. In a week, I'll have a new motherboard and/or a new controller card and a new 8TB drive I was planning on for parity. What's the best way to bring things back up as it sounds like at this point, it sounds like trusting parity to rebuild the drive wouldn't be a good idea. My initial thought was some what around clearing my 4TB parity drive, bringing up all 4 other discs, copying the data from the failing drive over to a good one and removing it from the array. All of this would I'm assuming be done with the array unprotected. Then once the failing 2TB drive is out and the 4TB drive is in, Put the 8TB drive and rebuild parity. Lastly, hope that I didn't lose too much data.
  3. I have a drive showing with a red X, device is disabled. About my system, I'm running Unraid 6.2 on an older Supermicro PDSMi board with a RR1U-ELi riser card with a SuperMicro AOC-SASLP-MV8 card. The drive that is having problems is connected to that card. It's a 4TB Seagate drive. I got it from a schucked enclosure a number of years back from Costco. I have two of those drives in the computer. I believe my mobo bios is up to date, I'm not sure what bios the card is running. This first happened about 3 months ago. At the time, I stopped the array, powered down the server, pulled and reset the power and data cables, restarted the server, checked the smart report, it came back clean on the drive, so went through the process of adding the drive back into the array, Things seemed to be working well for a bit. About a week ago, I noticed the same thing, I took the same steps only this time, I connected the drive to a different end of the breakout cable, the one I had it connected to looked a little suspect, same process, clean smart report, added drive back in. And now, back comes the red x, same drive. No now trying to figure out what's next. I'm not sure the drive is bad because after each reboot, it come back clean. I've tried different ends on the breakout cable (Monoprice). I could try ordering a new cable to see if that might be the issue. I'm not sure if it's the expansion card or maybe the riser or the combo. I don't think my mobo supports just the card without the riser. Any thoughts or am I too the point where I need to look for a new expansion card, or maybe a mobo that supports more SATA connections, or supports the card I have directly instead of via riser. Current Diagnostics - cascade-diagnostics-20170527-1408.zip Last Weeks Diagnostics - cascade-diagnostics-20170521-0812.zip
  4. 20130908 is the current log, at this point web console is still responding, parity is rebuilding right now. 20130907 the server was unresponsive and just before I triggered an unclean shutdown. syslog20130907.txt
  5. I'm upgrading from rc12 I think to 5 final. The upgrade seemed to go fine, installed the key rebooted and for the most part I can get to the console okay on reboot. I can get to the gui fine without problems for a bit, then all of sudden, nothing. I can't get to via server name or IP. The server is still up and functioning, serving files and I can get to it via telnet, it's just the console appears to be dead, I'm seeing a lot of these errors in my syslog. Sep 7 17:27:24 CASCADE avahi-daemon[1112]: Invalid response packet from host 192.168.1.101. Sep 7 17:29:04 CASCADE avahi-daemon[1112]: Invalid response packet from host 192.168.1.101. Sep 7 17:30:44 CASCADE avahi-daemon[1112]: Invalid response packet from host 192.168.1.101. Sep 7 17:32:23 CASCADE avahi-daemon[1112]: Invalid response packet from host 192.168.1.101. Sep 7 17:34:03 CASCADE avahi-daemon[1112]: Invalid response packet from host 192.168.1.101. This is pretty much a fresh install, no plugins and the stock 5.0 web gui. SMB setting are as follows: Enable SMB: Yes (Workgroup) Done Workgroup: BAWDENHOME Local master: No Yes -Josh
  6. I was having the same errors on my SuperMicro board. I honestly don't know what fixed it. I know know Tom changed a timeout value in a later release and that helped, but it seemed to be more kernel related as the newer kernel worked. My last Bios update for my board was over 5 years ago, so I'm tempted to think it's mobo related. These new boards are getting faster and faster and new OS's, sometimes I think they just don't work as well on the older hardware. I also got a new flashdrive and did notice it was faster. Sorry, I'm not more help.
  7. System booted, no USB not found errors, configuration valid, array started and parity is running right now. Will check again in the morning. Tom, thanks again for all your hard work. I know we don't say it enough. -Josh
  8. New flashdrive Sandisk Fit, old drive was getting 10MB/s new drive 31MB/s. Same result. I'll just wait for rc15. -Josh
  9. Thanks Tom. I may go out and get a new flash drive just to see if that makes a difference as well.
  10. I'm currently using a Lexar Firefly, that is only a couple of years old. I've checkdisked it and it comes back fine. I also tried another newer USB drive I have laying around. Same thing, system behaved the same way. Also if the flash drive was going bad, I would not have expected rc13 to work. It seems to be related some how to this http://lime-technology.com/forum/index.php?topic=25250.msg220900#msg220900 For what ever reason the USB drive isn't mounting and it's only waiting soo long. I do see a lot of ata5: link is slow to respond errors on the concole, but I thought those were due to ata ports on the mobo that are not in use. Jun 14 20:50:43 Tower kernel: sd 1:0:0:0: [sdb] Attached SCSI disk Jun 14 20:50:43 Tower kernel: scsi 8:0:0:0: Direct-Access Lexar JD FireFly 1100 PQ: 0 ANSI: 0 CCS Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: [sdd] 3915776 512-byte logical blocks: (2.00 GB/1.86 GiB) Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: [sdd] Write Protect is off Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: [sdd] Mode Sense: 43 00 00 00 Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: [sdd] No Caching mode page present Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: [sdd] Assuming drive cache: write through Jun 14 20:50:43 Tower kernel: sd 0:0:0:0: Attached scsi generic sg0 type 0 Jun 14 20:50:43 Tower kernel: sd 1:0:0:0: Attached scsi generic sg1 type 0 Jun 14 20:50:43 Tower kernel: sd 2:0:0:0: Attached scsi generic sg2 type 0 Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: Attached scsi generic sg3 type 0 Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: [sdd] No Caching mode page present Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: [sdd] Assuming drive cache: write through Jun 14 20:50:43 Tower kernel: sdd: sdd1 Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: [sdd] No Caching mode page present Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: [sdd] Assuming drive cache: write through Jun 14 20:50:43 Tower kernel: sd 8:0:0:0: [sdd] Attached SCSI removable disk Jun 14 20:50:43 Tower kernel: ata5: link is slow to respond, please be patient (ready=-19) Jun 14 20:50:43 Tower kernel: ata5: COMRESET failed (errno=-16) Jun 14 20:50:43 Tower kernel: ata5: link is slow to respond, please be patient (ready=-19) Jun 14 20:50:43 Tower kernel: ata5: COMRESET failed (errno=-16) Jun 14 20:50:43 Tower kernel: ata5: link is slow to respond, please be patient (ready=-19) Jun 14 20:50:43 Tower kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  11. At the console, I'm seeing a number of "waiting for /dev/disk/by-label/UNRAID" there are about 10 or messages then /dev/disk/by-label/UNRAID not found. Mounting non-root local file systems. if -rc15 is coming out with the 3.9.6 kernel, I'll just see how that goes as -rc13 worked. -Josh
  12. Thanks for all your hard work and awesome news!
  13. Can someone help me understand when in the process the USB drive gets mounted as /boot or as UNRAID and is there a timing of how long it waits for the drive to be loaded before moving on? I'm trying to troubleshoot an error. rc10 worked, 11-12 no workie, 13 worked, 14 no workie. It appears to be related to the USB drive not getting mounted or being mounted in properly. The only thing I can someone trace it to is the kernel. From my limited unix knowledge it appears the drive is mounted on the computer, but something is going on. The system comes up and emhttp doesn't appear or want to load. I can start it manually, but it has problems. root@Tower:/# ls /dev/disk/by-id/* /dev/disk/by-id/ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0310325@ /dev/disk/by-id/scsi-SATA_WDC_WD20EARS-00_WD-WMAZA2802606@ /dev/disk/by-id/ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0310325-part1@ /dev/disk/by-id/scsi-SATA_WDC_WD20EARS-00_WD-WMAZA2802606-part1@ /dev/disk/by-id/ata-WDC_WD20EARS-00MVWB0_WD-WMAZA2802606@ /dev/disk/by-id/usb-Lexar_JD_FireFly_TXVSZS46RZ0JRC7V5WG1-0:0@ /dev/disk/by-id/ata-WDC_WD20EARS-00MVWB0_WD-WMAZA2802606-part1@ /dev/disk/by-id/usb-Lexar_JD_FireFly_TXVSZS46RZ0JRC7V5WG1-0:0-part1@ /dev/disk/by-id/ata-WDC_WD20EARS-00S8B1_WD-WCAVY2836928@ /dev/disk/by-id/wwn-0x50014ee00261e6e5@ /dev/disk/by-id/ata-WDC_WD20EARS-00S8B1_WD-WCAVY2836928-part1@ /dev/disk/by-id/wwn-0x50014ee00261e6e5-part1@ /dev/disk/by-id/scsi-SATA_WDC_WD20EARS-00_WD-WCAVY2836928@ /dev/disk/by-id/wwn-0x50014ee2041dd087@ /dev/disk/by-id/scsi-SATA_WDC_WD20EARS-00_WD-WCAVY2836928-part1@ /dev/disk/by-id/wwn-0x50014ee2041dd087-part1@ /dev/disk/by-id/scsi-SATA_WDC_WD20EARS-00_WD-WMAZA0310325@ /dev/disk/by-id/wwn-0x50014ee65608294e@ /dev/disk/by-id/scsi-SATA_WDC_WD20EARS-00_WD-WMAZA0310325-part1@ /dev/disk/by-id/wwn-0x50014ee65608294e-part1@ root@Tower:/# ls /dev/disk/by-label/* /dev/disk/by-label/UNRAID@ Should I try unmounting /boot and remounting in manually and starting emhttp? Thanks, Josh syslog14a.txt
  14. Also hook up a monitor if you can. I had issues but I don't think they were related to the NIC. If you can get a monitor and keyboard hooked up, you can troubleshoot better, type ifconfig to see your ip and check your syslog by doing cat /var/log/syslog.
  15. I don't know what it is about that 3.4 kernel but my mobo doesn't like it. It never seems to find my USB drive. Mine doesn't even come up with rc14. I went back to 13 and it was fine. I've rebooted a few times with 13 and no issues with stopping the array. I feel for you Tom, trying to find something that works for everyone. syslog14.txt
  16. I'm up and running as well, this solved the issues I was having with 11 and 12a with not being able to come up. I'm still seeing some udevd worker failed errors but it's running which was more than before. Running parity now. -Josh
  17. It's an older system. I'm using the most current Bios, but it's still from 2008. SuperMicro PDSMi http://www.supermicro.com/products/motherboard/pd/E7230/pdsmi.cfm - Intel® E7230 (Mukilteo) Chipset - 1x Intel® 82573L PCI-e Gigabit LAN - 1x Intel® 82573V PCI-e Gigabit LAN - SATA ICH7R Controller - ATI RageXL Graphics Intel Pentium D 2.8 4GB Memory Lexar Firefly USB Drive WD Green Drives No addin cards installed at this time.
  18. May want to add a note about taking a snap shot of users so you have that to restore to.
  19. I went back through RC versions, It looks like I can get 10 to come up and the array starts successfully. 10 has come up on me twice, and twice it has hung with a "Waiting for USB Subsystem" message on the screen for more than 5 minutes. The two times it's come up I've seen the same error on the screen, nothing in the log, but the system did eventually come up. I've attached the syslog from 10. Right now I'm waiting for the permission utility to do it's thing before doing too much. So I know 10 is on kernel 3.4.24 and 11a is 3.4.26, is my problem likely a kernel error or a unRaid error? -Josh syslog10_running2.txt
  20. Yes, I did read through the Wiki article but didn't see anything that stood out relating to this. I'm using the stock go file which matches the wiki page. On the release page, in the past I've tried just copying over the bzimage and bzroot files, that didn't work. So I've set the drive up as I mentioned above, reformated with freshly downloaded 12a files (checksum matched) with my Shares folder, disk.cfg, ident.cfg, network.cfg, share.cfg, and super.dat. At this point the server isn't "coming up all the way" as emhttp isn't running, it isn't getting to the go script. So the array is showing "Stopped: no devices". I can see my three disks, just not my flash drive which is where are the config is. -Josh
  21. Still seems to be related to mounting the flash drive and or drives. I've tried different drives and had the same results. So I don't think it's my flash drive. I've also tried different usb slots on the MB, same result. I was able to get emhttp manually started and got the web server up. It shows a flash drive but no info. I was able to get another syslog after I started emhttp. I tried 11a and got similar errors, but I only tried it briefly. syslog05242013_512a_w_emhttp.txt
  22. Update: I've reformatted my flashdrive and put the 12a files on it and just the following files from my old install Shares folder, disk.cfg, ident.cfg, network.cfg, share.cfg, super.dat. I left the go from 12a and I didn't put a secrets.tbd file. Still no go, but I was able to telnet into the server and with putty get some screenshots of the syslog. I'm seeing the following which may be were it is stopping. May 25 20:55:07 Tower udevd[670]: worker [688] unexpectedly returned with status 0x0100 May 25 20:55:07 Tower udevd[670]: worker [688] failed while handling '/devices/pci0000:00/0000:00:1f.2' May 25 20:55:07 Tower udevd[670]: worker [708] unexpectedly returned with status 0x0100 May 25 20:55:07 Tower udevd[670]: worker [708] failed while handling '/devices/pci0000:00/0000:00:1c.5/0000:0e:00.0' May 25 20:55:07 Tower udevd[670]: worker [701] unexpectedly returned with status 0x0100 May 25 20:55:07 Tower udevd[670]: worker [701] failed while handling '/devices/pci0000:00/0000:00:1e.0/0000:0f:02.0' May 25 20:55:08 Tower udevd[670]: worker [803] unexpectedly returned with status 0x0100 May 25 20:55:08 Tower udevd[670]: worker [803] failed while handling '/devices/pci0000:00/0000:00:1d.7/usb1/1-2/1-2:1.0/host8/target8:0:0/8:0:0:0' I've heard about these being plugin related, but with fresh 12a files and only the files listed above, I don't have any plugin's installed. I've attached the syslog. PS. I copied over my 4.7 files and the server boots right up. -Josh syslog05242013_512a.txt
  23. Okay, I've done more trouble shooting. 1. Ran chkdsk on the flash drive and it came back with no errors. 2. I copied the entire root directory of 12a to my flash drive and ran make_bootable and it's still acting the same. 3. I tried removing the secrets.tbd file and nothing it's still behaving the same. I was able to get 4.7 back up and running, almost like I had it before. I lost my root password change and tried again with 12a and again no luck. I have found this, when I got 4.7 running and looked at the files in /boot all the files had permissions of wrxwrxwrx (777) for all the files in the config directory. When I shut the server down, I saw a message on the screen about remounting root filesystem read only. Right after I stopped the array and powered down 4.7, I stuck the flash drive in a Linux machine I had, and the config directory and all the files are now show rw-r--r-- (644). So something in the shut down changed the permissions of the files. But all the files and directories are showing. So my problem is something change those permissions and it isn't letting unRaid see them when it boots backup. Because when I boot 12a, at the console I only see the config directory with rwxr-xr-x (755) and no other files. Update: More digging, it seems to be my powerdown script that is causing it to mount the files as read only. I do have the Clean Powerdown package enabled and set to auto install. So should I just uninstall that? I see a command in the script to /bin/mount/ -v -n -o remount,ro / that I'm guessing is causing the issue. I'm wondering also about using the original go command with the unmenu and package installs instead of the generic one. -Josh