rockytt

Members
  • Posts

    129
  • Joined

  • Last visited

Everything posted by rockytt

  1. This started in the last couple of months, and I've been putting off (living dangerously!) seeking help here for the problem. Server locks up every so often (a few weeks) and I have to do an unclean shutdown as it doesn't respond to any commands via console or web interface. When I mount the discs, of course it starts a parity check and then the system locks up again after 30 seconds or so. I can start up the array w/o doing the parity check and read data off all disks w/o an issue - I get 12 happy/green balls and no indication that anything is wrong - unless I have to do another unclean shutdown again and then we get the same issue. I'm sure I'm overdue for some sort of hardware failure - but would rather not just start replacing stuff randomly - was hoping someone had experienced the same thing and could point me in a (helpful) direction - thanks!
  2. Thanks Gary- Instead of running the memtest, I just swapped out the memory stick for another. Don't see any obvious signs of cap failure (which doesn't necessarily mean anything) - but I think I'll go with the PS suggestion first if the new stick doesn't clear things up. MUCH easier to swap a PS than a MB, that's for sure! Since my MB is 6 years old now - how difficult is it to change it out for a newer one? Simply pull the 2 SATA cards and reinsert into the new board (same positions) and make note of the 4 SATA connections on the MB? (Obviously take a screenshot of the disks/positions for reference as well) Appreciate the help and we'll keep our fingers crossed for now
  3. Login prompt is still up - nothing changed, although now I can't do anything with the keyboard (and the array went away also). Unplugged/plugged in cables and swapped out the one for the server - still nothing.
  4. Hi Gary, Nothing unusual that I can see - goes to command prompt and I never lose control/access via keyboard. When array "disappears", I can't do anything over the network. Tower is no longer accessible via either browser or Windows explorer.
  5. Array (11+1 drives) running 4.7 started behaving strangely last night and disappeared from my network, although command line control was fine. Rebooted and worked fine for a few minutes before disappearing again. System is probably 5 (?) years old and still using the original flash drive I got from Tom when I purchased the license. Any way to really test the flash drive to see if it's starting to die? Because of the license issues I can't just copy the files to another usb stick and see if that clears things up can I - at least not without obtaining a new key - yes? I've attached a syslog just to see if there's anything hiding in there that might clear this up- syslog.txt
  6. Nope - that was a separate issue that I thought I'd clear up while I was here and everyone was being so helpful My best guess is that the MB (or usb controller) is dying as everything cleared up once I found a third (usb) slot to try the flash drive in
  7. Good to go - (feel like an idiot for that last one!) Thanks again to all!
  8. Nope - 4.7 I can actually see from the log which physical disk the files are on - just curious as to why they are listed as duplicates if there is only one "copy" showing on the disk
  9. OK - I'm cautiously optimistic. Rechecked the cables last night and gave 'em all a good jiggle - nothing obvious, but who knows? In the booting/rebooting over the last couple of days I'd tried different usb ports for the flash drive and gotten slightly different results. Found another port last night I didn't know existed and booted from there - and now the syslog looks almost completely clean(!) I'm leaning even more strongly towards a failing MB, or at least controller now. (Unless someone has a different theory) Thanks for all the help - amazing how much anxiety this box brings me every couple of years! Unrelated question, but didn't want to start a new thread. I searched on this fairly extensively, but couldn't find my exact situation: I'm getting several of these "Tower shfs: duplicate object: /mnt/disk3/***" errors in the log file - but the kicker is that the movies are NOT showing up on multiple disks. I would expect to find the same movie listed on 2 or more disks in the log file, but they are all showing up on single disks only. i.e. I'll see this line repeated with a different time stamp, but only on disk 9. If it were a duplicate object, wouldn't it appear on multiple disks? Thanks again everyone!
  10. Ran memtest for several hours - no errors reported. Am viewing the syslog right now via telnet - here's an attachment - Seeing some error msgs that hopefully could point a finger at what's going on with my box. (Still running 4.7 btw) syslog.txt
  11. temps are good and cpu fan is spinning merrily away right now
  12. OK - let it run for 24 hours - but the keyboard still wouldn't accept any input-frozen at "Tower login:" ran checkdisk on the flash - checked out just fine. Had another stick of memory from a working computer and swapped that in - initially seemed to work - got to the web gui and a parity check was started, but 10 minutes later the keyboard was frozen and I couldn't hit the array anymore. As far as the network goes, both the server and my computer are plugged into the same switch, coupled with the fact that every other computer in the house (3 others) have no connection issues. Could it be anything else other than a MB that has decided to pick this moment to die?
  13. Thanks for all the suggestions - I'll give 'em a go today and see what pops up. Windows explorer didn't pan out - although that was a good shot. Leaning towards a network issue, although the keyboard/server locking up is a bit disconcerting...
  14. I realize that was a bit vague Just watched the screen as it booted and it ran through it's (apparent) usual process - up to and including the login prompt (As I type this, I realize that it probably doesn't mean much..)
  15. Very strange - was just watching a movie with the kids when the movie kind of hiccuped (for lack of a better word) and suddenly the array disappeared. Went to the server and did a hard shutdown (shutdown -h now). It appeared to reboot normally, but was still not visible on the network, and it doesn't respond to the keyboard anymore - any thoughts on what I might try next? I'm (really) in uncharted waters now...
  16. thanks Joe - I think that did the trick. Popped in the original baby data drive, a new parity drive, reconfigured the array (initconfig) and am happily sitting at 63% with the new Parity-Sync. Once this finishes sometime tonight, I'll upgrade the little data drive, rebuild the data there and be a very happy camper Thanks again!
  17. Couldn't add the old disk (error msg - disk too small) but here's a SMART report from the parity drive: smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build) Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Green family Device Model: WDC WD20EADS-00S2B0 Serial Number: WD-WCAVY2631433 Firmware Version: 01.00A01 User Capacity: 2,000,398,934,016 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 8 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Thu Mar 7 09:15:59 2013 PST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (41160) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 255) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x303f) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 148 144 021 Pre-fail Always - 9583 4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1600 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 071 071 000 Old_age Always - 21478 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 235 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 75 193 Load_Cycle_Count 0x0032 195 195 000 Old_age Always - 16918 194 Temperature_Celsius 0x0022 127 095 000 Old_age Always - 25 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 199 199 000 Old_age Always - 450 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 207 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 192 190 000 Old_age Offline - 1732 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
  18. Not to be dense - but I don't see that function (Utils->New Config) on 4.7 But yes, I do have the disk
  19. Well, just wanted to update the size of the array by getting rid of the "baby". Removed the 500gb disk and replaced it with a 2tb. Fired up the array and it automatically formatted the drive and began restoring the data. After it "finished" 30 hours later, there were several thousand hours in the syslog-(as listed in the first post) I don't think I left out any steps-as that's all I did. (Just to update this again - popped in a new parity drive and it won't accept that either as it's looking for the old one... Red dot on the new parity drive and orange on the other new drive)
  20. One thought I had was to replace the parity drive and put the baby (500gb) drive back in its slot. That would allow parity to be rebuilt on a new disk and then I could go back and swap the baby drive for a 2tb one. No love - 2 red balls (Too many wrong and/or missing disks!) on the two drives Any thoughts? Ideas? I'd swap out both drives at the same time and then just copy the data from the baby drive back onto the array at a later date - but I'd imagine I'd keep getting the same error as above...
  21. (Running 4.7) Had a couple of smaller disks in my array (500gb) and went to swap one out for a 2tb that I just picked up. Had just run a parity check and it showed no errors, so I pulled the baby drive and popped in the larger one. 30 hours later, parity drive shows a couple thousand errors - so the array was extremely unhappy to say the least. Syslog shows a bunch of "Tower kernel: handle_stripe read error: 21009472/0, count: 1 Mar 5 18:07:48 Tower kernel: md: disk0 read error" lines and SMART shows several "Pre-Fail" and "Old_age" warnings" that I assume are for disk 0 Gotta be the parity drive-yes? Problem is, I can't just pop in the baby drive and replace the parity drive with a new one as I get a "replacement disk is too small" in reference to the 500gb drive. I'm sure there's a command or two that will allow me to do this, but I want to make sure before I hit one of those "I'm sure I want to do this-no really, I'm REALLY sure I want to do this" check boxes... Anyway - thanks for the help on this!
  22. Apologize for asking a question that's probably pretty obvious to everybody else-but here goes... (And yes, I searched all the threads I could find before posting) Long story short: I have a 2tb disc from an unRaid system of mine that I'd like to be able to access from a Windows (vista) machine to copy back onto the aforementioned unRaid box. When I connect the disc to the vista machine it shows as fully unallocated/uninitialized - which is what's supposed to happen - yes? I first run rfstool (as administrator) and get the DOS box that shows its checking disc 0 and disc 1 - then the window closes. When I run YAReG (also as admin), I don't see any disc information. Question - disc issue/data corruption or did I do something wrong?