Sever Crashes During Data-Rebuild


Recommended Posts

I've been trying to replace one of the 2TB drives with a pre-cleared 3TB drive. During the data-rebuild the server appears to crash, and it brings down my whole wired network (as soon as I disconnect the server, the network is back up and running). There isn't even any text displayed on the monitor I have connected. :o

 

When I reboot the server, the log starts over so I can't see what errors might have been thrown out. I've already successfully upgraded one of the 1TB drives.

 

I'm going to see if I can try and pre-clear another 3TB drive, but since that will take many many many hours, I was wondering if anyone had any ideas as to the cause of this mysterious crashing, or how to get a useable log.

 

Thanks.

Link to comment

I've been trying to replace one of the 2TB drives with a pre-cleared 3TB drive. During the data-rebuild the server appears to crash, and it brings down my whole wired network (as soon as I disconnect the server, the network is back up and running). There isn't even any text displayed on the monitor I have connected. :o

 

When I reboot the server, the log starts over so I can't see what errors might have been thrown out. I've already successfully upgraded one of the 1TB drives.

 

I'm going to see if I can try and pre-clear another 3TB drive, but since that will take many many many hours, I was wondering if anyone had any ideas as to the cause of this mysterious crashing, or how to get a useable log.

 

Thanks.

 

If you are changing out a smaller drive with a larger one, you don't need to run pre-clear.  Please attempt the operation again.  Before starting the rebuild, click on the 'Log' button in the menu bar; this will open a separate window and display syslog entries as they occur.  If the system locks up then you should be able to select/copy the text in the window and post here.

Link to comment

/usr/bin/tail -n 40 -f /var/log/syslog
Apr 4 17:28:42 Asgard emhttp: shcmd (824): mkdir /mnt/disk12
Apr 4 17:28:42 Asgard emhttp: shcmd (825): set -o pipefail ; mount -t reiserfs -o user_xattr,acl,noatime,nodiratime /dev/md12 /mnt/disk12 |& logger
Apr 4 17:28:42 Asgard kernel: REISERFS (device md12): found reiserfs format "3.6" with standard journal
Apr 4 17:28:42 Asgard kernel: REISERFS (device md12): using ordered data mode
Apr 4 17:28:42 Asgard kernel: reiserfs: using flush barriers
Apr 4 17:28:42 Asgard kernel: REISERFS (device md12): journal params: device md12, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30
Apr 4 17:28:42 Asgard kernel: REISERFS (device md12): checking transaction log (md12)
Apr 4 17:28:42 Asgard kernel: REISERFS (device md12): replayed 3 transactions in 0 seconds
Apr 4 17:28:42 Asgard kernel: REISERFS (device md12): Using r5 hash to sort names
Apr 4 17:28:42 Asgard emhttp: shcmd (826): chmod 777 '/mnt/disk12'
Apr 4 17:28:42 Asgard emhttp: shcmd (827): chown nobody:users '/mnt/disk12'
Apr 4 17:28:42 Asgard emhttp: shcmd (828): mkdir /mnt/cache
Apr 4 17:28:42 Asgard emhttp: shcmd (829): set -o pipefail ; mount -t reiserfs -o user_xattr,acl,noatime,nodiratime /dev/sda1 /mnt/cache |& logger
Apr 4 17:28:42 Asgard kernel: REISERFS (device sda1): found reiserfs format "3.6" with standard journal
Apr 4 17:28:42 Asgard kernel: REISERFS (device sda1): using ordered data mode
Apr 4 17:28:42 Asgard kernel: reiserfs: using flush barriers
Apr 4 17:28:42 Asgard kernel: REISERFS (device sda1): journal params: device sda1, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30
Apr 4 17:28:42 Asgard kernel: REISERFS (device sda1): checking transaction log (sda1)
Apr 4 17:28:42 Asgard kernel: REISERFS (device sda1): replayed 3 transactions in 0 seconds
Apr 4 17:28:42 Asgard kernel: REISERFS (device sda1): Using r5 hash to sort names
Apr 4 17:28:43 Asgard emhttp: shcmd (830): chmod 777 '/mnt/cache'
Apr 4 17:28:43 Asgard emhttp: shcmd (831): chown nobody:users '/mnt/cache'
Apr 4 17:28:43 Asgard emhttp: shcmd (832): mkdir /mnt/user0
Apr 4 17:28:43 Asgard emhttp: shcmd (833): /usr/local/sbin/shfs /mnt/user0 -disks 16777214 -o noatime,big_writes,allow_other,use_ino
Apr 4 17:28:43 Asgard emhttp: shcmd (834): mkdir /mnt/user
Apr 4 17:28:43 Asgard emhttp: shcmd (835): /usr/local/sbin/shfs /mnt/user -disks 16777215 2000000 -o noatime,big_writes,allow_other,use_ino
Apr 4 17:28:43 Asgard emhttp: shcmd (836): crontab -c /etc/cron.d - <<< "# Generated mover schedule: 30 4 * * * /usr/local/sbin/mover |& logger"
Apr 4 17:28:43 Asgard emhttp: shcmd (837): /usr/local/sbin/emhttp_event disks_mounted
Apr 4 17:28:43 Asgard emhttp_event: disks_mounted
Apr 4 17:28:43 Asgard kernel: mdcmd (55): check CORRECT
Apr 4 17:28:43 Asgard kernel: md: recovery thread woken up ...
Apr 4 17:28:43 Asgard kernel: md: recovery thread rebuilding disk4 ...
Apr 4 17:28:43 Asgard kernel: md: using 1536k window, over a total of 2930266532 blocks.
Apr 4 17:28:44 Asgard emhttp: shcmd (838): :>/etc/samba/smb-shares.conf
Apr 4 17:28:44 Asgard emhttp: Restart SMB...
Apr 4 17:28:44 Asgard emhttp: shcmd (839): killall -HUP smbd
Apr 4 17:28:44 Asgard emhttp: shcmd (840): ps axc | grep -q rpc.mountd
Apr 4 17:28:44 Asgard emhttp: _shcmd: shcmd (840): exit status: 1
Apr 4 17:28:44 Asgard emhttp: shcmd (841): /usr/local/sbin/emhttp_event svcs_restarted
Apr 4 17:28:44 Asgard emhttp_event: svcs_restarted

 

It's now showing as starting the rebuild, but the log stopped recording.

Link to comment
  • 1 year later...

Happy New Year to all!

 

I'm sorry to revive an older topic, but I have the exact same issue (not sure on the network issue). Just now I started a fourth attempt to Data Rebuild. It is still running now (@ 1.3%) but I'm nonetheless attaching a syslog as per now. I hope to be able to get one after a chrash (hope it doesn't of course)

 

Any ideas what may be causing this?

 

Thanks a lot!

syslog_DR.txt.zip

Link to comment

Happy New Year to all!

 

I'm sorry to revive an older topic, but I have the exact same issue (not sure on the network issue). Just now I started a fourth attempt to Data Rebuild. It is still running now (@ 1.3%) but I'm nonetheless attaching a syslog as per now. I hope to be able to get one after a chrash (hope it doesn't of course)

 

Any ideas what may be causing this?

 

Thanks a lot!

 

I see nothing suspicious in the syslog (I am not an expert but think I would spot something obvious).

 

I DO see a ton of packages being installed. I might suggest booting into safe-mode and see if that fixes the problem. If so, you know you have some incompatibility with a package.

Link to comment

Thanks a lot for your support! It is still running now (5%). Next attempt I will for sure do it in maintenance mode, great suggestion.

 

NOT maintenance mode - safe mode.

 

Safe mode is a boot option, not something you pick from the GUI.

Link to comment

I'm probably not searching in the right places for guidance on booting into Safe Mode. Could you please point me in the right direction?

It is one of the options that appears on the Boot menu if you are running with a console+keyboard, or via IPMI for equivalent functionality.  If you are running headless then you can edit the syslinux.cfg file on the flash to change which entry is the default one.

Link to comment

Safe mode is a boot option. You need to stop the array, reboot the server, and select that boot option from the boot menu.

 

If you run headless, there is a way to make safe mode the default menu item.

 

See THIS POST and dgaschk's sig for information about how to get to a completely stock boot.

Link to comment

I must be loosing my mind on this one.

 

I have two brand new 4TB's (from different production batches). 4TB#1 failed to rebuild several times, on the last crash I noticed it stopped at 22%. I assumed a DOA. With 4TB#2 I followed your suggestion. Since I run the unit headless, I "stock booted" the unit. And guess what: also with 4GB#2 the (stock) system crashed at 22%.

 

I am clueless and starting to get frustrated.

 

Any ideas?

 

Thank you!

 

Edit: crashing at 22% with both drives must have been a coincidence. I'm now at 32%, fingers crossed...

Edit2: still no clue why it crashed at least 6 times, but this Data rebuild was successful. Many thanks for your support!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.