Add a disk on the array, and now it can't Start at all.

February 15, 201115 yr

My system was working for years now.

In the end of last year, I removed an already empty defect drive.

I replaced it with a new empty drive of 1.5TB, but it was not part of the array; it was ready to do so.

Of course, I precleared it successfully.

Because of no more empty place, I've decided to add this disk on the array.

I've made a successfull parity check.

I didn't even touch the system.

Then I, in this order :

- comment the cache command on the go file, to have a chance to stop the array after a start without having to wait some hours.

- stop the array

- assign the drive on the devices

- start the array

- tick the (I'm sure to do this) and start format

- wait till done

- stop the array

Bad luck, I don't know why, but there is a job running on the disk 10, and it doesn't unmount.

I don't know how, but the cache is running, so I telnet a cache -q, but the disk doesn't unmount after this.

(until now, all the problem I met with unraid system are because I can't stop it correctly in a decent time)

Because I was in a hurry, I've made an error, and the system stop suddenly.

Now the system can boot, but unraid doesn't start.

After a reboot, the system is ready but "Stopped. Configuration valid."

If I press on start, the disk are marked as "mounting", then it stop, with always the same message : "Stopped. Configuration valid."

On the syslog, I can read just something like "start STOPPED" then "do_run: lock_rdev error: -6"

No more explanation at all...

Help please.

syslog-2011-02-15.txt

Quote

February 15, 201115 yr

Author

Is it because of the new drive?

At the end of the formatting process, everything was looking fine.

I've read somewhere there is a process to remove an empty drive, is it a good idea?

And, btw, what are the meaning of this :

Feb 15 23:56:25 Tower emhttp: shcmd (4): /usr/local/sbin/set_ncq sdq 1 >/dev/null
Feb 15 23:56:25 Tower emhttp: shcmd (5): /usr/local/sbin/set_ncq sdf 1 >/dev/null

Feb 15 23:56:25 Tower emhttp: shcmd (6): /usr/local/sbin/set_ncq sdg 1 >/dev/null

Feb 15 23:56:25 Tower emhttp: shcmd (7): /usr/local/sbin/set_ncq sdi 1 >/dev/null

Feb 15 23:56:25 Tower emhttp: shcmd (: /usr/local/sbin/set_ncq sdh 1 >/dev/null

Feb 15 23:56:25 Tower emhttp: shcmd (9): /usr/local/sbin/set_ncq sdd 1 >/dev/null

Feb 15 23:56:25 Tower emhttp: shcmd (10): /usr/local/sbin/set_ncq sdc 1 >/dev/null

Feb 15 23:56:25 Tower emhttp: shcmd (11): /usr/local/sbin/set_ncq sde 1 >/dev/null

Feb 15 23:56:25 Tower emhttp: shcmd (12): /usr/local/sbin/set_ncq sdb 1 >/dev/null

Feb 15 23:56:25 Tower emhttp: shcmd (13): /usr/local/sbin/set_ncq sda 1 >/dev/null

Feb 15 23:56:25 Tower emhttp: shcmd (14): /usr/local/sbin/set_ncq sdl 1 >/dev/null

Feb 15 23:56:25 Tower emhttp: shcmd (15): /usr/local/sbin/set_ncq sdm 1 >/dev/null

Feb 15 23:56:25 Tower emhttp: shcmd (16): /usr/local/sbin/set_ncq sdn 1 >/dev/null

Feb 15 23:56:25 Tower emhttp: shcmd (17): /usr/local/sbin/set_ncq sdo 1 >/dev/null

Feb 15 23:56:25 Tower emhttp: shcmd (18): /usr/local/sbin/set_ncq sdp 1 >/dev/null

Feb 15 23:56:25 Tower emhttp: shcmd (19): /usr/local/sbin/set_ncq sdj 1 >/dev/null

It's very weird, all those data that are there (I hope) but the system stopped, all the hdd green, just it doesn't start. :-[

Anyone to help? Have I lost everything?

Quote

February 16, 201115 yr

Those commands in the log are merely setting the NCQ parameter on each respective drive.

Quote

February 16, 201115 yr

Author

So, it's normal, right?

What is wrong with my system? Everything look fine but it won't start.

I'm completely lost!

Quote

February 16, 201115 yr

You've not lost anything.

most of the times people have not been able to start the array is because some add-on is preventing it, so step 1 is to disable all the extras and then reboot.

Quote

February 16, 201115 yr

Author

You've not lost anything.

most of the times people have not been able to start the array is because some add-on is preventing it, so step 1 is to disable all the extras and then reboot.

Thank you for your reassuring answer.

After commenting the ntfs-3g and the uu lines, the go file seems like a new one.

#!/bin/bash
# Start the Management Utility

/usr/local/sbin/emhttp &

#cp /boot/config/samba/smb.conf /etc/samba

#smbcontrol smbd reload-config

#installpkg /boot/packages/ntfs-3g-2009.3.8-i486-1.tgz

#/boot/packages/cache_dirs -B -w

#/boot/unmenu/uu

But after the reboot, no change.

I think there is no more add-on from a stock unraid, but where have I to look at?

I have somewhere a copy of the usb boot key that works (one month ago), and there was no change in the hardware in between.

Just the assignation of the disk above, and of course the format that was done.

Is it a good idea to put the content back on the key?

Normally, it should work if there is no hardware failure now, is'n it?

EDIT : Of course, I understand that my array will be unprotected, because the parity will be false (with one drive less). Bad idea.

Quote

February 16, 201115 yr

You've not lost anything.

most of the times people have not been able to start the array is because some add-on is preventing it, so step 1 is to disable all the extras and then reboot.

Thank you for your reassuring answer.

After commenting the ntfs-3g and the uu lines, the go file seems like a new one.

Yes, I think you are right.

#!/bin/bash
# Start the Management Utility

/usr/local/sbin/emhttp &

#cp /boot/config/samba/smb.conf /etc/samba

#smbcontrol smbd reload-config

#installpkg /boot/packages/ntfs-3g-2009.3.8-i486-1.tgz

#/boot/packages/cache_dirs -B -w

#/boot/unmenu/uu

But after the reboot, no change.

I think there is no more add-on from a stock unraid, but where have I to look at?

I have somewhere a copy of the usb boot key that works (one month ago), and there was no change in the hardware in between.

Just the assignation of the disk above, and of course the format that was done.

Is it a good idea to put the content back on the key?

Normally, it should work if there is no hardware failure now, is'n it?

You can certainly try that. It would simply ignore the newly added drive and it would not know the parity was out of date. (The current parity takes into account the formatting) Expect many errors on a parity check as it corrects parity.

One thing you would want to double-check is if your power supply is up to the task. Is it possible you've just went over its limit to supply start up current to the drives?

What specific power supply make/model are you using?

Joe L.

Quote

February 16, 201115 yr

Author

One thing you would want to double-check is if your power supply is up to the task. Is it possible you've just went over its limit to supply start up current to the drives?

What specific power supply make/model are you using?

Joe L.

Very improbable, I've already used two external drives more connected to this power supply (by an external molex).

The hdd just assigned was already powered before.

As written in the signature, the power supply is a qtec 500w.

Btw, swapping this power suplly is at least a one day work.

You can certainly try that. It would simply ignore the newly added drive and it would not know the parity was out of date. (The current parity takes into account the formatting) Expect many errors on a parity check as it corrects parity.

Is this a little risked step? I do not know well the contents of the key towards the parity at this moment.

Quote

February 16, 201115 yr

Author

You can certainly try that. It would simply ignore the newly added drive and it would not know the parity was out of date. (The current parity takes into account the formatting) Expect many errors on a parity check as it corrects parity.

Very disturbing.

I've made it, and except the drive that was added in between disappear, everything look the same, always the same problem.

Just in case, I've attached the new syslog.

Now I'm completely bloqued.

syslog-2011-02-16.txt

Quote

February 16, 201115 yr

Author

And, just in case, because the last drive wasn't needed, I've disconnected this HDD power supply now.

No change.

I can't start my array.

Tower kernel: md: do_run: lock_rdev error: -6

Any info more needed?

Quote

February 16, 201115 yr

Author

After reading this in the syslog :

Feb 15 23:56:25 Tower emhttp: get_fstype: open /dev/sda1: No such file or directory

I'm asking myself if the problems doesn't lie with the sda1 drive.

In this case, the better solution is to change this drive to a another one (I have one right here), and restart building it with the correct configuration (from the start of the problem, including the new drive), no?

This new drive must be in a special state (precleared, cleared, formatted in another unraid system), or not? The current one is ntfs formatted.

Quote

February 16, 201115 yr

The partition on sda is gone:

Feb 16 02:33:58 Tower kernel: sda: unknown partition table

Quote

February 16, 201115 yr

Author

The partition on sda is gone:

Feb 16 02:33:58 Tower kernel: sda: unknown partition table

Yes, it's what I understand.

Now, how to come back to a working system?

Simply remove the sda, and the unraid will start and run slowly with the parity drive, and then rebuild another drive?

Or trying to rewrite the MasterBootRecord on the drive which failed (faster)?

And then, about the diagnose, this drive was already unMounted when the array was brutally switched off, is this a normal behaviour?

Quote

February 16, 201115 yr

That would do it, wouldn't it.

You can repair it.

Before you do, please post the output of the following commands:

fdisk -l -u /dev/sda

dd if=/dev/sda count=1 2>/dev/null | od -x -A d

Joe L.

Quote

February 16, 201115 yr

Author

That would do it, wouldn't it.

You can repair it.

Before you do, please post the output of the following commands:

fdisk -l -u /dev/sda

root@Tower:~# fdisk -l -u /dev/sda

Disk /dev/sda: 1000.2 GB, 1000204886016 bytes

255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors

Units = sectors of 1 * 512 = 512 bytes

Disk identifier: 0x00000000

Disk /dev/sda doesn't contain a valid partition table

root@Tower:~#

dd if=/dev/sda count=1 2>/dev/null | od -x -A d

root@Tower:~# dd if=/dev/sda count=1 2>/dev/null | od -x -A d
0000000 0000 0000 0000 0000 0000 0000 0000 0000

*

0000512

root@Tower:~#

Joe L.

I can repair it, but how?

If it's by writing the mbr, and knowing I'm a noob in linux, it will be tricky. Or maybe with a tool I don't know.

If it's by rebuilding, have I to prepare the hdd? Another one, or the disk already on place. Is it safe?

Thanks for your help

Quote

February 16, 201115 yr

I can repair it, but how?

If it's by writing the mbr, and knowing I'm a noob in linux, it will be tricky. Or maybe with a tool I don't know.

It is by use of a tool I wrote to help another user.

Find it here:

http://lime-technology.com/forum/index.php?topic=5072.msg47122#msg47122

If it's by rebuilding, have I to prepare the hdd? Another one, or the disk already on place. Is it safe?

Thanks for your help

I think you can use the existing disk. The utility will only write the MBR and make it exacty as unRAID would have made it.

After using it the fdisk command should show a partition.

unzip it to your flash drive, then type

/boot/unraid_partition_disk.sh /dev/sda

then, as long as you're sure you have the right disk, invoke it with the -p option.

/boot/unraid_partition_disk.sh -p /dev/sda

Joe L.

Quote

February 16, 201115 yr

Author

Ok, I'll try this in a moment.

Just in case, and because I normally can trust the parity disk more than the failed sda one, is it a way to not let the system made a parity check with correction?

AFAIK, when I will press the start button, the parity check will also start automatically, and if there are sync errors, those will be reflected on the parity disk, right?

But those errors are on the data disk, and the parity datas will be changed from correct to incorrect.

In this case, is it not a better idea to simply throw out the failed disk and replace it by another one, to let the data be rebuild correctly?

Is there a process to start the array without an automatic parity check?

Thnx

Quote

February 16, 201115 yr

Author

Bad news.

I've made the correction on the MBR.

Then I started the array, and of course the parity check start immediately.

Doesn't see any error counter, but the parity disk was writing all the time, so I stopped the parity check.

It show a lot of parity errors, as I suspected.

I've stopped the array now, and I don't know what to do.

Rebuild a new disk, with the errors I created?

Where have I made an error?

syslog attached.

syslog-2011-02-16a.txt

Quote

February 16, 201115 yr

Bad news.

I've made the correction on the MBR.

Then I started the array, and of course the parity check start immediately.

Doesn't see any error counter, but the parity disk was writing all the time, so I stopped the parity check.

It show a lot of parity errors, as I suspected.

I've stopped the array now, and I don't know what to do.

Rebuild a new disk, with the errors I created?

Where have I made an error?

syslog attached.

The parity checks that were started were in NOCORRECT mode, therefore no changes were actually written to the parity drive.

From your log:

Feb 16 13:32:08 Tower kernel: mdcmd (69): check NOCORRECT

Can you get to the files on the drive that had the missing partition now?

Joe L.

Quote

February 16, 201115 yr

Author

I made a call to the check NOCORRECT before starting the array, and after stopping it.

The first, before the start didn't work, and I don't know why.

Feb 16 13:30:08 Tower kernel: mdcmd (46): check NOCORRECT
Feb 16 13:30:08 Tower kernel:

Feb 16 13:30:08 Tower kernel: md: check_array: not started

The other was launched after stopping the array.

Feb 16 13:32:08 Tower kernel: mdcmd (69): check NOCORRECT

But, in between, there was the automatic check parity process :

Feb 16 13:30:55 Tower kernel: mdcmd (51): check

And after I can read :

Feb 16 13:30:56 Tower kernel: md: parity incorrect: 128
Feb 16 13:30:56 Tower kernel: REISERFS (device md1): found reiserfs format "3.6" with standard journal

Feb 16 13:30:56 Tower kernel: md: parity incorrect: 136

Feb 16 13:30:56 Tower kernel: REISERFS (device md1): using ordered data mode

Feb 16 13:30:56 Tower kernel: md: parity incorrect: 144

Feb 16 13:30:56 Tower kernel: md: parity incorrect: 152

Feb 16 13:30:56 Tower kernel: md: parity incorrect: 160

Feb 16 13:30:56 Tower kernel: md: parity incorrect: 168

Feb 16 13:30:56 Tower kernel: md: parity incorrect: 176

Feb 16 13:30:56 Tower kernel: md: parity incorrect: 184

Feb 16 13:30:56 Tower kernel: md: parity incorrect: 192

Feb 16 13:30:56 Tower kernel: md: parity incorrect: 200

Feb 16 13:30:56 Tower kernel: md: parity incorrect: 208

Feb 16 13:30:56 Tower kernel: md: parity incorrect: 216

Feb 16 13:30:56 Tower kernel: md: parity incorrect: 224

Feb 16 13:30:56 Tower kernel: md: parity incorrect: 232

Feb 16 13:30:56 Tower kernel: md: parity incorrect: 240

Feb 16 13:30:56 Tower kernel: md: parity incorrect: 248

Feb 16 13:30:56 Tower kernel: md: parity incorrect: 256

Feb 16 13:30:56 Tower kernel: md: parity incorrect: 264

Feb 16 13:30:56 Tower kernel: md: parity incorrect: 272

Feb 16 13:30:56 Tower kernel: md: parity incorrect: 280

So, until there is something I don't understand, the parity disk was corrupted when I started the array.

If it's not the case, I have another chance to rebuild the data disk by replacing it now, isn't it?

Yes, I have access to the files on the disk that had the missing partition.

Those are mainly video files, and it seems that I can use some of them.

Is there an utility to verify the integrity of the files from the disk that had the missing partition?

Quote

February 16, 201115 yr

I missed that additional start of the "check"

I would un-assign the parity drive (or un-plug it) so a parity check cannot occur.

I would the perform a file system check on the drive that had the missing partition.

reiserfsck --check /dev/md9

(I think it is disk 9 that was the drive with the missing MBR, use the correct "md" device if I'm incorrect.)

The issue is that your existing parity data includes the formatting of the disk you now have removed from your array.

I think you can probably revert to your "current" config, the one that includes the newly formatted drive.

Then, the parity is probably more correct (with the exceptions of the changes it just made when you started it most recently)

If you have a spare disk, I'd go ahead and replace the disk with the previously missing MBR and get as much as possible re-constructed onto the replacement.

Quote

February 17, 201115 yr

Author

It's an all day job...

The first thing I've made, until now, is to simply copy the data that can be read from the disk that had the missing partition on the new disk I've just added.

Of course, this disk was nearly full, as expected, it's a fairly long process, with the added feature I have to bypass the lost files.

After this, I've made a quick check on the copy.

Now there are some errors on the syslog, around 300, that are

Feb 16 17:10:02 Tower kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 0 does not match to the expected one 1
Feb 16 17:10:02 Tower kernel: REISERFS error (device md9): vs-5150 search_by_key: invalid format found in block 32769. Fsck?

I think those errors are the same as the sync errors that remain on the array (following the check -NOCORRECT, there is still around 40 sync errors that remain at the start of the check).

Now I've to re-create a clean disk (the md9) and finally recheck completely the parity.

I've completely leave the solution of reconsrtuction, it's far too late now.

I would un-assign the parity drive (or un-plug it) so a parity check cannot occur.

I would the perform a file system check on the drive that had the missing partition.

reiserfsck --check /dev/md9

So :

- stopped the array

- unassign the parity disk

- check the "I'm sure..."

- start the array

- then, on a putty window :

root@Tower:~# reiserfsck --check /dev/md9
reiserfsck 3.6.21 (2009 www.namesys.com)

*************************************************************

** If you are using the latest reiserfsprogs and it fails **

** please email bug reports to [email protected], **

** providing as much information as possible -- your **

** hardware, kernel, patches, settings, all reiserfsck **

** messages (including version), the reiserfsck logfile, **

** check the syslog file for any related information. **

** If you would like advice on using this program, support **

** is available for $25 at www.namesys.com/support.html. **

*************************************************************

Will read-only check consistency of the filesystem on /dev/md9

Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

###########

reiserfsck --check started at Thu Feb 17 12:49:05 2011

###########

Partition /dev/md9 is mounted with write permissions, cannot check it

root@Tower:~#

I cannot make this check.

I've peek on the smart status of the three drives involved so far, the parity, the md9 (failed), the md15 (new one) and there is nothing special.

Now, some questions :

- why have I to do this without parity enabled (I know it's nearly lost, but anyway, is it a good idea?)

- the data on the md9 are to be erased, so I want to reconstruct a clean file system on this disk (not a check, I know there are errors), but how?

Thanx

Quote

February 17, 201115 yr

To run the file system check you must first un-mount the drive.

umount /mnt/disk9

Quote

February 17, 201115 yr

Author

To run the file system check you must first un-mount the drive.

umount /mnt/disk9

If it's unmounted, it will be ready to format, it's a better idea.

Why should I do this without parity enabled?

Anyway, I know I've to rebuild the parity disk, it's lost now, but I don't like to be a funambulist.

Quote

February 17, 201115 yr

To run the file system check you must first un-mount the drive.

umount /mnt/disk9

If it's unmounted, it will be ready to format, it's a better idea.

Why should I do this without parity enabled?

Anyway, I know I've to rebuild the parity disk, it's lost now, but I don't like to be a funambulist.

Typically, the answer would be yes, since the fixes it makes would also be applied to the parity disk.

The --check will not fix anything, but it will tell you which option to apply next to the reiserfsck command.

Quote

Add a disk on the array, and now it can't Start at all.

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)