broken drive but can't replace it? [SOLVED]


Recommended Posts

Hi all,

 

I’ve got what seems like a major issue with my UnRaid server.

 

Disk 4 (500gb) failed over the weekend, so I shutdown the server and replaced it with a new 1tb drive.

 

Upon bootup unRaid seemed to format the new disk, but it doesn’t seem happy with it.

 

In the attached screenshot you can see that it says this new disk is ‘wrong’ for some reason.

https://dl.dropboxusercontent.com/u/8815457/temp-unraid01.JPG

 

There are also these errors in the syslog:

“May 20 10:06:43 Tower init: Re-reading inittab

May 20 10:06:44 Tower kernel: FAT-fs (sdb1): error, fat_get_cluster: invalid cluster chain (i_pos 2822999)

May 20 10:06:44 Tower kernel: FAT-fs (sdb1): Filesystem has been set read-only

May 20 10:06:44 Tower kernel: FAT-fs (sdb1): error, fat_get_cluster: invalid cluster chain (i_pos 2822999)

May 20 10:06:44 Tower last message repeated 8 times”

 

Please find attached a screen shot and logs.

 

Thanks in advance,

Danny

dmesg.txt

syslog.txt

Link to comment
  • Replies 56
  • Created
  • Last Reply

Top Posters In This Topic

Did you just stop the array and assign the new drive (I am assuming that the Seagate referenced is the problem drive)?

 

If so, then I think the procedure to follow is:

  • stop the array, and un-assign the drive in question
  • start the array so that it says "missing" for the drive in question (and the drive is being emulated using parity plus the other drives)
  • stop the array and assign the replacement drive
  • start the array to rebuild onto the new drive

Link to comment

I did that, when I selected the new drive it still said 'wrong'.

 

Screen shot attached.

That screenshot does not say it is wrong!  It says that unRAID has accepted the drive and is rebuilding the contents of the old drive onto the new one.  If you are talking about the colour of its icon, then it is currently orange because the rebuild is in progress, and the icon will turn green when it has finished the rebuild process.

Link to comment

It finished that.

 

But I see some concerning lines in the syslog:

May 20 12:47:01 Tower kernel: write_file: error 30 opening /boot/config/super.dat

May 20 12:47:01 Tower kernel: md: could not write superblock from /boot/config/super.dat

 

I have attached the full log.

 

Perhaps there is a problem with the USB stick itself as I tried writing to it myself:

root@Tower:/boot/custom# touch test

touch: cannot touch `test': Read-only file system

root@Tower:/boot/custom#

syslog.txt

dmesg.txt

Link to comment

It finished that.

 

But I see some concerning lines in the syslog:

May 20 12:47:01 Tower kernel: write_file: error 30 opening /boot/config/super.dat

May 20 12:47:01 Tower kernel: md: could not write superblock from /boot/config/super.dat

 

I have attached the full log.

 

Perhaps there is a problem with the USB stick itself as I tried writing to it myself:

root@Tower:/boot/custom# touch test

touch: cannot touch `test': Read-only file system

root@Tower:/boot/custom#

It does sound like your flash drive is corrupt or possibly dead. The usual advice is to put it in your PC and run checkdisk. However, since you have just updated your array configuration, and unRAID was probably unable to write that updated configuration to the the flash drive (super.dat), it is going to be a little more complicated. The very first thing you should do is get a new screenshot so you know exactly which drives belong in which slots, because you are probably going to have to set a new configuration after you get your flash drive taken care of.

 

Also, post that screenshot so we all know exactly what you have. Are all drives including parity showing green balls now?

 

Just post the new screenshot and wait for further advice.

Link to comment

I forgot to reply re plugins, yes I have unMenu and the following installed through it:

 

bwm-ng - Bandwidth Monitor NG (Next Generation), a live bandwidth monitor

Currently Installed. Will be automatically Re-Installed upon Re-Boot.

 

cxxlibs-6.0.9-i486.tgz library accidentally left out of unRAID 4.4-beta2 through 4.5beta5

Installed, Not Downloaded

 

hdparm v9.27 - get /set hard disk parameters

Installed, Not Downloaded

 

hdparm - v9.37 (read/set hard drive parameters)

Installed, Not Downloaded

 

lsof (list open files)

Installed, Not Downloaded

 

mail and ssmtp - Configure unRAID to be able to send e-mail notifications via the "mail" command.

Currently Installed. Will be automatically Re-Installed upon Re-Boot.

 

unRAID Status Alert sent hourly by e-mail

Currently Installed. Will be automatically Re-Installed upon Re-Boot.

 

Monthly Parity Check

Currently Installed. Will be automatically Re-Installed upon Re-Boot.

 

openssh

Installed, Not Downloaded

 

pci utils (pci utilities)

Currently Installed. Will be automatically Re-Installed upon Re-Boot.

 

unRAID Power-Down on disk overtemp

Currently Installed. Will be automatically Re-Installed upon Re-Boot.

 

Clean Powerdown

Currently Installed. Will be automatically Re-Installed upon Re-Boot.

 

reiserfsprogs version 3.6.21

Installed, Not Downloaded

 

rsync (remote file sync)

Installed, Not Downloaded

 

screen (screen manager with VT100/ANSI terminal emulation)

Currently Installed. Will be automatically Re-Installed upon Re-Boot.

 

SMART tools (smartctl hard drive monitoring utilities)

Installed, Not Downloaded

 

infozip (Info-ZIP's zip and unzip utilities)

Currently Installed. Will be automatically Re-Installed upon Re-Boot.

 

Link to comment

Hi,

 

I knew I was behind on the versions, but as I had all the functionality I really wanted.

I went with the if it isn't broken don't try and fix it method!

 

I tend to agree with your line of reasoning for stable versions.  But it falls apart with release candidates, Tom put them out there with the warning that there could be problems and there was a reason for each of the next fifteen release candidates.  I also know that I had problems with one or two of them and had to go back to the previous rc  version.  But when starts down the path of using them, one must have the attitude that one is going to complete the process.  Remember that none of the rc's were really around long enough to really find out if they were totally bug-free!  (Plus, there always was a reason the the next rc was required.  If there had been any bugs in that rc version, it would have been final!)

Link to comment

Hi, thanks for the info, please find attached the latest screen shot.

 

Many thanks,

Danny

OK, now that we have that. The reason I am concerned is that I think there is a very good chance that if you reboot unRAID will not remember that you have replaced a drive since it couldn't write super.dat.

 

Since you are probably having flash drive problems, you might get a screen shot or printout of any other setting screens in unRAID or unMenu in case you have to recreate any of this.

 

Also run a non-correcting parity check now. If everything is OK, then you can stop the array and shutdown. Put the flash drive in your PC and run checkdisk on it.

 

Report back with the results of running checkdisk and we can go from there.

 

Link to comment
  • 4 weeks later...

OK. Had to review the thread since it's been a few weeks.

 

I think you should be OK to go ahead and try to boot from the repaired flash. It may be that it is still expecting the old disk4 and will say it is wrong. If so, since you already did a rebuild and parity check, you should be able to New Config and Trust Parity.

 

Go ahead and boot and when you can get to the webGUI post another screenshot before you proceed.

 

 

Link to comment

Hi,

 

I ran a check on the usb key and it seemed to find a problem with one of the files.

 

Please find attached a screen shot of the scan error.

 

Thanks,

Danny

 

I'd suggest deleting the affected file and re-downloading it via unmenu.

 

 

Sent from a mobile device, sorry for any typos.

 

 

Link to comment

Hi,

 

I ran a check on the usb key and it seemed to find a problem with one of the files.

 

Please find attached a screen shot of the scan error.

 

Thanks,

Danny

I'd suggest deleting the affected file and re-downloading it via unmenu.

 

 

Sent from a mobile device, sorry for any typos.

Link to comment

Hi,

 

I ran a check on the usb key and it seemed to find a problem with one of the files.

 

Please find attached a screen shot of the scan error.

 

Thanks,

Danny

I'd suggest deleting the affected file and re-downloading it via unmenu.

 

 

Sent from a mobile device, sorry for any typos.

For that file, no download is needed.  Just set that package to NOT re-install on re-boot. (that will delete that .auto_install file)

and then set it to re-install on re-boot.  (That will re-create the file)

 

Link to comment

supermoocow, please verify my understanding of your situation below:

 

You have already done the rebuild onto the new disk4 followed by a successful non-correcting parity check with no errors.

 

However, your flash drive was corrupt and read-only, and super.dat could not be written, so unRAID does not remember the drive was replaced.

 

You are on version 5-rc5, but I doubt the version matters for now. You do have unMenu installed.

 

I think you could just do New Config and Trust Parity at this point.

 

You could also do the rebuild again but that would take more time and possibly allow more problems to appear.

 

Maybe mounting disk4 outside the array and verifying your data is indeed there before doing New Config - Trust Parity would be appropriate.

 

Anyone else have a better idea?

 

Link to comment

supermoocow, please verify my understanding of your situation below:

 

You have already done the rebuild onto the new disk4 followed by a successful non-correcting parity check with no errors.

 

However, your flash drive was corrupt and read-only, and super.dat could not be written, so unRAID does not remember the drive was replaced.

 

You are on version 5-rc5, but I doubt the version matters for now. You do have unMenu installed.

 

That's correct.

 

I've looked through the syslog again but unfortunately it looks like there is still something wrong with the flash drive.

 

There are these errors:

Jun 17 22:51:33 Tower init: Re-reading inittab

Jun 17 22:51:34 Tower kernel: FAT-fs (sdb1): error, fat_get_cluster: invalid cluster chain (i_pos 2822999)

Jun 17 22:51:34 Tower kernel: FAT-fs (sdb1): Filesystem has been set read-only

Jun 17 22:51:34 Tower kernel: FAT-fs (sdb1): error, fat_get_cluster: invalid cluster chain (i_pos 2822999)

Jun 17 22:51:34 Tower last message repeated 8 times

 

A full copy of the log is attached.

 

Thanks in advance,

Danny

syslog.txt.txt

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.