Jump to content

A bad disk & unformatted drive(s)


Recommended Posts

Posted

This may have gotten posted already, but I don't see it, sorry if it's a repost.

 

I had a disk go bad and didn't realize it (life gotten somewhat in the way of keeping tabs on my server).  I noticed when I started seeing data missing on the network shares.  Looks like the server is now seeing ther other (believed to be good) disks as Unformatted.  I've poked around on the forums and wiki a little bit and it seems like the server may have just had a problem mounting the reiserfs which is causing the drives to show up as unformated?  Kinda hoping that there's a solution here that doesn't involve losing the data on the unformatted disks.  I see on the wiki that there maybe be a way to repair the FS?  Basically just looking for some help getting pointed in the right direction on what to do.  Right now the array is stopped, server is still running, all drives are still present in same configuration.

 

Looking for what I should do first here.  I don't want to do the wrong thing and lose data if I don't have to.

 

Running build 4.5.3, posted devices screenshot and syslog.

 

disk2 shows up as bad.  First time I tried to start the array disk1 showed up as unformatted.  Last time I tried to start the array disk1 and disk5 showed up as unformatted.

 

Thanks in advance for any help or assistance with where to look for solutions.

syslog-2011-12-03.txt

Unraid_Devices.png.9c1023005bd93871d1880ecc363ce7f9.png

Posted

I think a powerdown and reboot is in order.

 

The syslog does not show how the drives are being initially identified. 

 

Whatever you do, do NOT press the "Format" button.  It would only complicate your issues.

Posted

Powering down now, syslog when it's back up?

 

I've read enough of the forums, and screwed up with that format button early on in my UnRaid experience, to pretty much assume that unless I'm pretty much 100% sure I want to format something I don't want to hit that button. ;)

Posted

Powering down now, syslog when it's back up?

Yes.

unless I'm pretty much 100% sure I want to format something I don't want to hit that button. ;)

You are very wise. ;)
Posted

Actually disk1 is showing as missing...I check the device page and it shows up as the correct drive (SAMSUNG_HD154UI_S1Y6J1LS722478), but on the unraid main page it shows that drive listed as the parity disk.

Posted

Actually disk1 is showing as missing...I check the device page and it shows up as the correct drive (SAMSUNG_HD154UI_S1Y6J1LS722478), but on the unraid main page it shows that drive listed as the parity disk.

and the syslog says it is the wrong disk.
Posted

Hrm.  Tried rebooting again, now I'm back to disk1 showing the correct disk, but showing unformatted, and disk2 showing up as a bad disk and showing up as unformatted.

 

Not really sure how to figure out what I need to do.  Is there some type of utility to check disk1?  I don't really think it's a good idea to just replace disk2 since right now disk1 is showing unformatted and there should be information on it...

 

Newest syslog...

syslog-2011-12-03.txt

Posted

doing some more poking around on the forums and I see where you've helped a couple people with various issues that might be related in some way to mine Joe....

 

Looks like maybe the MBR of that first disk has gotten screwed up somehow?  Not sure I want to go mucking about with trying to fix that until I've got a better idea of what's going wrong.

 

Tried running SMART for disks 1 & 2, as well as a file system check from unmenu....nothing very informative from either, but here are the reports.

disk1-fsck.txt

disk2-fsck.txt

disk1-SMART-short.txt

disk2-SMART-short.txt

Posted

I'm in a very similar sounding situation right now. Two drives showed errors and I didn't see this until last night. I decided to replace the drive with the more errors first today, but suddenly a third drive completely red-balled. After a reboot I have three drives supposedly unformatted. I've possibly lost 5 TB of media.

 

Posted

This is what I see in the log.

 

All disks are recognized and reiserfs' are found all the disks as you can see by the found reiserfs as listed below.

 

Dec  3 19:44:09 FileServer kernel: REISERFS (device md4): found reiserfs format "3.6" with standard journal
Dec  3 19:44:09 FileServer kernel: REISERFS (device md4): using ordered data mode
Dec  3 19:44:09 FileServer kernel: REISERFS (device md4): journal params: device md4, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30
Dec  3 19:44:09 FileServer kernel: REISERFS (device md4): checking transaction log (md4)
Dec  3 19:44:09 FileServer kernel: REISERFS (device md5): found reiserfs format "3.6" with standard journal
Dec  3 19:44:09 FileServer kernel: REISERFS (device md5): using ordered data mode
Dec  3 19:44:09 FileServer kernel: REISERFS (device md3): found reiserfs format "3.6" with standard journal
Dec  3 19:44:09 FileServer kernel: REISERFS (device md3): using ordered data mode
Dec  3 19:44:09 FileServer kernel: REISERFS (device md2): found reiserfs format "3.6" with standard journal
Dec  3 19:44:09 FileServer kernel: REISERFS (device md2): using ordered data mode
Dec  3 19:44:09 FileServer kernel: REISERFS (device md1): found reiserfs format "3.6" with standard journal
Dec  3 19:44:09 FileServer kernel: REISERFS (device md1): using ordered data mode

 

Disk1 starts to throw a bunch of link errors like this with the link being reset multiple times.

 

Dec  3 19:44:41 FileServer kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x1800000 action 0x6
Dec  3 19:44:41 FileServer kernel: ata4.00: BMDMA stat 0x5
Dec  3 19:44:41 FileServer kernel: ata4: SError: { LinkSeq TrStaTrns }
Dec  3 19:44:41 FileServer kernel: ata4.00: failed command: WRITE DMA
Dec  3 19:44:41 FileServer kernel: ata4.00: cmd ca/00:08:bf:00:00/00:00:00:00:00/e0 tag 0 dma 4096 out
Dec  3 19:44:41 FileServer kernel:          res 51/84:08:bf:00:00/00:00:00:00:00/e0 Emask 0x10 (ATA bus error)
Dec  3 19:44:41 FileServer kernel: ata4.00: status: { DRDY ERR }
Dec  3 19:44:41 FileServer kernel: ata4.00: error: { ICRC ABRT }
Dec  3 19:44:41 FileServer kernel: ata4: hard resetting link
Dec  3 19:44:41 FileServer kernel: ata4: nv: skipping hardreset on occupied port
Dec  3 19:44:41 FileServer kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Dec  3 19:44:41 FileServer kernel: ata4.00: configured for UDMA/100
Dec  3 19:44:41 FileServer kernel: ata4: EH complete

 

The attempts to reset the connection to disk1 are finally abandoned.

 

Dec  3 19:46:56 FileServer kernel: ata4: reset failed, giving up

 

This messes up unRAID and causes the disk1 to be knocked-out of the array;

 

Dec  3 19:45:25 FileServer emhttp: disk1 mount error: 32
Dec  3 19:45:25 FileServer emhttp: shcmd (22): rmdir /mnt/disk1

 

Dec  3 19:46:56 FileServer kernel: md: disk1 read error
Dec  3 19:46:56 FileServer kernel: handle_stripe read error: 50168/1, count: 1
Dec  3 19:46:56 FileServer kernel: REISERFS warning (device md2): journal-1212 journal_read_transaction: REPLAY FAILURE fsck required! buffer write failed
Dec  3 19:46:56 FileServer kernel: REISERFS warning (device md2): reiserfs-2006 journal_init: Replay Failure, unable to mount
Dec  3 19:46:56 FileServer kernel: REISERFS warning (device md2): sh-2022 reiserfs_fill_super: unable to initialize journal space
Dec  3 19:46:56 FileServer kernel: sd 4:0:0:0: [sde] READ CAPACITY(16) failed
Dec  3 19:46:56 FileServer kernel: sd 4:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00
Dec  3 19:46:56 FileServer kernel: sd 4:0:0:0: [sde] Sense not available.
Dec  3 19:46:56 FileServer kernel: sd 4:0:0:0: [sde] READ CAPACITY failed
Dec  3 19:46:56 FileServer kernel: sd 4:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00
Dec  3 19:46:56 FileServer kernel: sd 4:0:0:0: [sde] Sense not available.
Dec  3 19:46:56 FileServer kernel: sd 4:0:0:0: [sde] Asking for cache data failed
Dec  3 19:46:56 FileServer kernel: sd 4:0:0:0: [sde] Assuming drive cache: write through
Dec  3 19:46:56 FileServer kernel: sde: detected capacity change from 1500301910016 to 0

 

My guess is that you first need to try replacing the disk cable and power supply connectors on disk1. If that fails, then you need to move disk1 to a new SATA port.

 

Oddly enough, I don't see anything about disk2 after the attempt to mount it. My guess is that disk1 is stopping disk2 from working. I suspect you're running a SATA card that emulates 2 IDE controllers with 2 disks per controller. With IDE controllers, 1 bad disk on the controller can knock-out both of the disks. Knowing what SATA card you're using could help.

 

Peter

Posted

After doing ever more reading on the forums last night I decided it might be power related (I don't have any experience to speak of reading or interpreting smart logs).  I powered down the box and went to check the cabling and power.  I feel fairly certain that 3 of the 4 SATA drives are good, they're using nice locking connectors, I realize that doesn't ensure the cable is still good, but one disk had a connector that seemed a little loose.  I went ahead and pulled that cable and used a different cable on that.

 

Also doublechecked all the power connectors and pulled the case fans off of a splitter running from the two IDE drives and put the fans on their own cable running from the PS just in case.

 

Upon reboot it looks like everything is back to where I was when I first started.  That is I have greenballs on 0,1,3,4,5 and 2 is redballed.

 

I just ran short SMART test on each of the drives.  I'll post syslog and SMART reports.

 

Would appreciate any input.

 

Thanks in advance..

syslog-2011-12-04.txt

disk0-smart-short-20111204.txt

disk1-smart-short-20111204.txt

disk2-smart-short-20111204.txt

Posted

Oh and missed it earlier, but the board is an ASUS A8N5X.  I'm using all 4 of the onboard SATA ports and 2 IDE drives.

 

Changing the cable and double checking all of the power seems to at least have the array back to where it was.

 

Based on the syslog and smart reports does it seem like it's OK to replace the drive that's red and rebuild?

 

 

Posted

Do a reiserfsck on disk2. If you follow the wiki you can use md2, otherwise if you stop the array you have to use sdb1. Don't forget the 1 after the sdb.

 

I believe you need to stop the array, unassign disk2, start the array, stop the array, assign disk2 and then start the array. This will rebuild the simulated disk2 data back onto the existing disk2 which appears to be OK.

 

When disk2 is unassigned you should check that all the files are being simulated properly from it. Oddly enough, the syslog is saying that disk2 is recognized and mounted. If you didn't want to rebuild disk2 or were worried that the simulated data is corrupt then you could probably just initialize the array and rebuild parity. You would lose anything that was written to disk2 while it was being simulated.

 

Peter

 

Posted

Do a reiserfsck on disk2. If you follow the wiki you can use md2, otherwise if you stop the array you have to use sdb1. Don't forget the 1 after the sdb.

 

I believe you need to stop the array, unassign disk2, start the array, stop the array, assign disk2 and then start the array. This will rebuild the simulated disk2 data back onto the existing disk2 which appears to be OK.

 

When disk2 is unassigned you should check that all the files are being simulated properly from it. Oddly enough, the syslog is saying that disk2 is recognized and mounted. If you didn't want to rebuild disk2 or were worried that the simulated data is corrupt then you could probably just initialize the array and rebuild parity. You would lose anything that was written to disk2 while it was being simulated.

 

Peter

 

If you do the file-system repair on the raw (/dev/sdX1 ) disk partition rather than on /dev/md2 then parity will NOT be in sync and you will need to do a correcting parity check to fix it, otherwise parity will show errors when next run.
Posted
If you do the file-system repair on the raw (/dev/sdX1 ) disk partition rather than on /dev/md2 then parity will NOT be in sync and you will need to do a correcting parity check to fix it, otherwise parity will show errors when next run.

 

Yes, but only if you do repairs, not when you do a check.

 

Peter

 

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...