ridge

Members
  • Posts

    115
  • Joined

  • Last visited

Everything posted by ridge

  1. So I'm back to (what looks like) a stable system. I rolled back to beta 6a, and the system has been up without a problem for 5 hours, approximately 10x longer than it was under beta 14 after the hard drive troubles. I'm rebuilding the array right now with the new hard disk, which is going well. I'll post back tomorrow when it's done. Meanwhile, I find it very odd that 6a works just fine when 14 was reporting what seemed to be hardware issues beyond those of the hard drive, AND was crashing the server every few minutes...
  2. Server crashed again with: INFO: rcu_sched_state detected stall on CPU 0 (t=1376280 jiffries) lines repeating down the page.
  3. I also replaced the preclear script with the latest version, FYI.
  4. OK, thanks dgaschk. Did all that. Server back up and running (after finding settings to make the Flash drive boot!). Now, after no errors in memtest after a solid 24 hour test, what's next for diagnosis?
  5. Ran memtest for 24 hours without any errors. Changed the BIOS clock speed as per Joe's advice to the settings specified for the memory (clock speed to 667Mhz was the only one I could change) and rebooted. Now the server is completely unresponsive. Monitor won't come on and I can't back to the BIOS! Help!
  6. Set preclear running and went out. Came back to babysit and found this: ================================================================== 1.9 = unRAID server Pre-Clear disk /dev/sdi = cycle 1 of 1, partition start on sector 64 = Disk Pre-Read in progress: 4% complete = ( 93,768,192,000 bytes of 2,000,398,934,016 read ) 117 MB/s = = = = = = = = = = Disk Temperature: 32C, Elapsed Time: 0:14:06 Message from syslogd@Gallifrey at Sat Jan 7 14:00:02 2012 ... Gallifrey kernel: EIP: [<c10889d7>] path_lookupat+0x2ef/0x4f9 SS:ESP 0068:c52fbe48 System crashed again. Time for memtest? Additionally, will this failure during preclear negatively impact the new drive at all?
  7. Ha! Well now I feel like a moron! Thanks! Realized I also needed to preclear the disk (been a while since I did this the first time) so I'm doing that right now. When it's finished and added to the array (hopefully parity will have rebuilt the contents) I'll post back.
  8. OK, my inexperience is showing. I bought a replacement 2TB drive, followed the instructions here http://lime-technology.com/wiki/index.php?title=Replacing_a_Data_Drive, only to now have Disk 9 showing that it's not installed, and seemingly no way forward. Attached image.
  9. OK, off to get a new drive. I'll post more when that's set up.
  10. Connected up a monitor (with the intention of running memtest) to be greeted by a whole page of this: INFO: rcu_sched_state detected stall on CPU 0 (t=1376280 jiffries) Only the numbers after "t=" are different on each line. CPU stalling?
  11. reiserfsck didn't finish: ########### reiserfsck --check started at Fri Jan 6 07:18:59 2012 ########### Replaying journal: Done. Reiserfs journal '/dev/md9' in blocks [18..8211]: 0 transactions replayed Checking internal tree.. \/ 7 (of 16\/ 70 (of 170\/116 (of 170\ Message from syslogd@Gallifrey at Fri Jan 6 08:00:13 2012 ... Gallifrey kernel: EIP: [<c1092787>] mntget+0x7/0xf SS:ESP 0068:c7c0bdf8 Message from syslogd@Gallifrey at Fri Jan 6 08:00:13 2012 ... Gallifrey kernel: CR2: 00000000213cfe98 Message from syslogd@Gallifrey at Fri Jan 6 08:00:13 2012 ... Gallifrey kernel: Call Trace: Message from syslogd@Gallifrey at Fri Jan 6 08:00:13 2012 ... Gallifrey kernel: Code: 00 00 00 8b 00 e8 81 ff ff ff 85 c0 74 06 8b 50 18 64 ff 02 ba 80 49 4a c1 64 03 15 14 40 4a c1 fe 02 5d c3 55 85 c0 89 e5 74 06 <8b> 50 18 64 ff 02 5d c3 55 ba 80 49 4a c1 89 e5 53 8b 58 48 64 Message from syslogd@Gallifrey at Fri Jan 6 08:00:13 2012 ... Gallifrey kernel: Stack: Message from syslogd@Gallifrey at Fri Jan 6 08:00:13 2012 ... Gallifrey kernel: Process fuser (pid: 15007, ti=c7c0a000 task=f09b2520 task.ti=c7c0a000) Message from syslogd@Gallifrey at Fri Jan 6 08:00:13 2012 ... Gallifrey kernel: Oops: 0000 [#1] SMP Now what?
  12. SMART report attached. I'm running a reiserfsck -- check on the drive now too. Will post results shortly. smart.txt
  13. Thanks. I got that much. From the syslog I posted, does it look like there's anything else going on I should be aware of before replacing the drive and rebuilding the parity?
  14. Running 5.0beta14. Server went down while watching TV earlier. Couldn't access gui, and couldn't access via telnet. Had to do hard reboot. On reboot, unraid did a parity check and found one drive with a lot of errors. Screenshot is attached (unfortunately it's the only one I got). Array came back up, but I was completely unable to run anything via console because the system was generating error message after error message without stopping. I couldn't escape out of it. Eventually the server stopped responding and I had to do a hard power down. Attached is a small portion of my syslog (the whole thing zipped is 14MB, so I just took the first 8800 lines or so and zipped that). I'm hoping this is as simple as a dead drive, but wanted some expert guidance based on the syslog before I go ripping out the drive. What makes me most nervous is I can't do anything on the server via the console when it's up because it's just generating system messages so fast without stopping! So, next steps? I can run to Fry's in the morning and buy a replacement drive if necessary. Does it look like that's the only problem? Edited to add syslog Thanks for any insight you guys can offer! syslog.txt.zip