<SOLVED> WTF happened..

July 22, 201114 yr

I am using UnRaid Server version 4.6 on one of my servers. I have a 20 HD server that has two Supermicro AOC-SASLP-MV8 cards cards using a Super Micro motherboard. My system acted up when I was recording some data on my server. I got a data i/o error showing up in AnyDVD. I decided to shut down the server and restart it. The system hung up on boot up stating HD Failure on the second MV8 card. I restarted and it booted the system again after I resitted the drive in question. It booted up this time around, but looking a the main page it was showing 3 drives missing. I then reset the system one more time and this time it showed all the drives but the parity drive had a red dot. I realize I should have stopped right there, but I decided what the hell I will try to redo parity using the INITCONFIG command. Well my system hung up in the process and this is where I am. Can someone please advise. I have attached my syslog (I know I should have just stopped right when the error occured and gotten the syslog at that time, but my impatience got the best of me )

Update: I changed out the MV8 card that was connected to the drives that were showing problems. I was able to boot up smoothly. I tried to rerun parity, but the system froze after 6 to 8 hours into it. My screen on the Unraid server posted a bunch of codes, [<C....>], very similar to this other post.

http://lime-technology.com/forum/index.php?topic=12192.msg116061#msg116061

Apr  5 15:44:19 Tower kernel:  [<c10244e9>] warn_slowpath_fmt+0x24/0x27
Apr  5 15:44:19 Tower kernel:  [<c123b505>] dev_watchdog+0xff/0x17f
Apr  5 15:44:19 Tower kernel:  [<c1037139>] ? sched_clock_cpu+0x136/0x14a
Apr  5 15:44:19 Tower kernel:  [<c123b406>] ? dev_watchdog+0x0/0x17f
Apr  5 15:44:19 Tower kernel:  [<c102bb23>] run_timer_softirq+0x105/0x158
Apr  5 15:44:19 Tower kernel:  [<c1028261>] __do_softirq+0x84/0xf8
Apr  5 15:44:19 Tower kernel:  [<c10282fb>] do_softirq+0x26/0x2b
Apr  5 15:44:19 Tower kernel:  [<c1028556>] irq_exit+0x29/0x2b
Apr  5 15:44:19 Tower kernel:  [<c10118f0>] smp_apic_timer_interrupt+0x6f/0x7d
Apr  5 15:44:19 Tower kernel:  [<c10031f6>] apic_timer_interrupt+0x2a/0x30
Apr  5 15:44:19 Tower kernel:  [<c10085f9>] ? mwait_idle+0x4c/0x52
Apr  5 15:44:19 Tower kernel:  [<c12108ad>] cpuidle_idle_call+0x28/0x9b
Apr  5 15:44:19 Tower kernel:  [<c1001a14>] cpu_idle+0x3a/0x4e
Apr  5 15:44:19 Tower kernel:  [<c129c662>] start_secondary+0x195/0x19a

I decided to run a memory test last night and this morning it had two errors so far. I have ordered some new RAM today, so we shall see what happens when it comes in and I change it.

It seems it was a combination of problems. I had to switch out one of me MV8 cards out and change out the memory. I was able to redo a parity check without it dying on me. I had a few error, but I am redoing it as I type this. I only have one other problem, the temps of my drives got really hot(Parity Drve hit 50c) in my Norco 4220 case. For the moment I have a fan blowing directly into it as I do the parity to keep it cool (down to 30c). I decided to get some fan controllers so I can blow my internal fans harder.

syslog.zip

Quote

<SOLVED> WTF happened..

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)