Jump to content

Stuck at formatting


Recommended Posts

Greetings,

 

I added two hard drives (500GB & 250GB) to my raid yesterday. 24 hours later the web interface says the drives are still formatting. Any suggestions on how to make sure everything is going ok? I don't want to just pull the plug so to speak but I don't have a way via the web to stop the raid or power it down.

 

My setup is a 1TB parity drive, 3x500GB, 250GB, 160GB.

 

Thanks.

Link to comment

Greetings,

 

I added two hard drives (500GB & 250GB) to my raid yesterday. 24 hours later the web interface says the drives are still formatting. Any suggestions on how to make sure everything is going ok? I don't want to just pull the plug so to speak but I don't have a way via the web to stop the raid or power it down.

 

My setup is a 1TB parity drive, 3x500GB, 250GB, 160GB.

 

Thanks.

Have you refreshed the browser?  It does not auto-refresh.  Formatting takes about a minute or so.  Clearing the disks if they are not pre-cleared takes 4-6hours.  If you did not press the "Start" button to start the array it will wait forever for you to press it.

 

What did you press last? 

 

To analyze more, we need to see a copy of the syslog.  Instructions in the wiki under troubleshooting on how to grab a copy.

 

If you can log in there are ways to shut down.  Can you log onto the server?

 

Joe L.

Link to comment

Thanks for the quick reply! Here are the answers to your questions.

 

Yes I have hit the refresh button, tried a different browser and restarted my computer that I'm logging in from.

I don't have the option to press start.

I think I just pressed "Format" last.

 

Attached is a shortened copy of my syslog, the full log is over 1MB and too big to post.

 

Thanks again for the help.

syslog_04_27_10_18_46_2.txt

Link to comment

Thanks for the quick reply! Here are the answers to your questions.

 

Yes I have hit the refresh button, tried a different browser and restarted my computer that I'm logging in from.

I don't have the option to press start.

I think I just pressed "Format" last.

 

Attached is a shortened copy of my syslog, the full log is over 1MB and too big to post.

 

Thanks again for the help.

 

According to your syslog, your server attempted to mount two new drives and failed since they did not have a file-system on them (this is expected, since it has not yes formatted them):

Apr 26 06:48:26 Tower emhttp: disk5 mount error: 32\

Apr 26 06:48:26 Tower emhttp: shcmd (62): rmdir /mnt/disk5\

Apr 26 06:48:26 Tower emhttp: _shcmd: shcmd (62): exit status: 32\

Apr 26 06:48:26 Tower emhttp: disk4 mount error: 32\

Apr 26 06:48:26 Tower emhttp: shcmd (63): rmdir /mnt/disk4\

Apr 26 06:48:26 Tower kernel: REISERFS warning (device md5): sh-2021 reiserfs_fill_super: can not find reiserfs on md5\

Apr 26 06:48:26 Tower kernel: REISERFS warning (device md4): sh-2021 reiserfs_fill_super: can not find reiserfs on md4\

 

Then, it experienced a kernel "oops"  (it recognized it was attempting to access a null memory pointer)  This is not normal.

Apr 26 06:48:32 Tower kernel: BUG: unable to handle kernel NULL pointer dereference at 00000010\

Apr 26 06:48:32 Tower kernel: IP: [<c1030b21>] wq_per_cpu+0x1/0x19\

Apr 26 06:48:32 Tower kernel: *pdpt = 0000000036e6c001 *pde = 0000000000000000 \

Apr 26 06:48:32 Tower kernel: Oops: 0000 [#1] SMP \

Apr 26 06:48:32 Tower kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/stat\

Apr 26 06:48:32 Tower kernel: Modules linked in: md_mod xor r8169 pata_jmicron jmicron ahci [last unloaded: md_mod]\

Apr 26 06:48:32 Tower kernel: \

Apr 26 06:48:32 Tower kernel: Pid: 2224, comm: emhttp Not tainted (2.6.32.9-unRAID #1) EX58-UD5\

Apr 26 06:48:32 Tower kernel: EIP: 0060:[<c1030b21>] EFLAGS: 00210246 CPU: 2\

Apr 26 06:48:32 Tower kernel: EIP is at wq_per_cpu+0x1/0x19\

Apr 26 06:48:32 Tower kernel: EAX: 00000000 EBX: f860a134 ECX: f860a134 EDX: 00000002\

Apr 26 06:48:32 Tower kernel: ESI: f860a144 EDI: 00000000 EBP: e0009e1c ESP: e0009e04\

Apr 26 06:48:32 Tower kernel:  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068\

Apr 26 06:48:32 Tower kernel: Process emhttp (pid: 2224, ti=e0008000 task=f767cdc0 task.ti=e0008000)\

Apr 26 06:48:32 Tower kernel: Stack:\

Apr 26 06:48:32 Tower kernel:  e0009e1c c103134e ffffffff c2de4cd4 c2de4c80 f78f4000 e0009e28 c10313cc\

Apr 26 06:48:32 Tower kernel: <0> 0000000a e0009eb0 c10c0203 03dd3578 f75dc6e0 f74d5e00 e0009ed0 00000010\

Apr 26 06:48:32 Tower kernel: <0> 00000000 f85fa000 c7ff0000 f78fa000 e8837140 00000001 00000000 00000004\

Apr 26 06:48:32 Tower kernel: Call Trace:\

Apr 26 06:48:32 Tower kernel:  [<c103134e>] ? queue_delayed_work_on+0x44/0x94\

Apr 26 06:48:32 Tower kernel:  [<c10313cc>] ? queue_delayed_work+0x1b/0x1e\

Apr 26 06:48:32 Tower kernel:  [<c10c0203>] ? do_journal_end+0x935/0xb33\

Apr 26 06:48:32 Tower kernel:  [<c10c045c>] ? journal_end_sync+0x5b/0x63\

Apr 26 06:48:32 Tower kernel:  [<c10b4133>] ? reiserfs_sync_fs+0x32/0x4f\

Apr 26 06:48:32 Tower kernel:  [<c10850d4>] ? __sync_filesystem+0x38/0x49\

Apr 26 06:48:32 Tower kernel:  [<c108522b>] ? sync_filesystem+0x2c/0x41\

Apr 26 06:48:32 Tower kernel:  [<c106da13>] ? do_remount_sb+0x45/0xab\

Apr 26 06:48:32 Tower kernel:  [<c107e7b8>] ? do_mount+0x1c9/0x5f9\

Apr 26 06:48:32 Tower kernel:  [<c1016e90>] ? do_page_fault+0x0/0x1e4\

Apr 26 06:48:32 Tower kernel:  [<c1054f19>] ? strndup_user+0x40/0x5f\

Apr 26 06:48:32 Tower kernel:  [<c107ec49>] ? sys_mount+0x61/0x94\

Apr 26 06:48:32 Tower kernel:  [<c1002935>] ? syscall_call+0x7/0xb\

Apr 26 06:48:32 Tower kernel: Code: 18 a6 43 c1 89 e5 85 c0 75 04 0f 0b eb fe 8b 14 95 64 ff 3e c1 8b 00 64 8b 0d 54 a4 43 c1 3b 4c 10 20 5d 0f 94 c0 0f b6 c0 c3 55 <83> 78 10 00 89 e5 0f 45 15 5c 00 3f c1 8b 00 5d 03 04 95 64 ff \

Apr 26 06:48:32 Tower kernel: EIP: [<c1030b21>] wq_per_cpu+0x1/0x19 SS:ESP 0068:e0009e04\

Apr 26 06:48:32 Tower kernel: CR2: 0000000000000010\

Apr 26 06:48:32 Tower kernel: ---[ end trace eb08f767687d737b ]---\

 

It appears to reference /dev/sda, but it is difficult to interpret since you only included the tail end of the syslog.

 

To cleanly stop the server right now, you can try this series of commands:

cd  /

samba stop

umount  /mnt/disk1

umount  /mnt/disk2

umount  /mnt/disk3

mdcmd stop

reboot

 

The lase two messages in the syslog seem most puzzling, since normally linux does not have ".exe" suffixes.

Apr 27 05:47:12 Tower kernel: FahCore_78.exe[2495]: segfault at 65e46e4 ip 08087963 sp bf7fe3dc error 4 in FahCore_78.exe[8048000+322000]\

Apr 27 10:20:57 Tower kernel: FahCore_78.exe[2536]: segfault at 65e46e4 ip 08087963 sp bf3fe3dc error 4 in FahCore_78.exe[8048000+322000]}

Link to comment

Those last two segfaults are from the Folding At Home [ http://folding.stanford.edu/ ] distributed computing program. They actually name the cores they use with a 'exe' filename extension.

 

nighttraitor, try not starting F@H until after the formats are done. The additional stress F@H puts on the system (memory usage and cpu usage) might be triggering several corner cases not fully exhibited by the kernel's normal test suite.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...