DosSpider Posted December 3, 2011 Posted December 3, 2011 This may have gotten posted already, but I don't see it, sorry if it's a repost. I had a disk go bad and didn't realize it (life gotten somewhat in the way of keeping tabs on my server). I noticed when I started seeing data missing on the network shares. Looks like the server is now seeing ther other (believed to be good) disks as Unformatted. I've poked around on the forums and wiki a little bit and it seems like the server may have just had a problem mounting the reiserfs which is causing the drives to show up as unformated? Kinda hoping that there's a solution here that doesn't involve losing the data on the unformatted disks. I see on the wiki that there maybe be a way to repair the FS? Basically just looking for some help getting pointed in the right direction on what to do. Right now the array is stopped, server is still running, all drives are still present in same configuration. Looking for what I should do first here. I don't want to do the wrong thing and lose data if I don't have to. Running build 4.5.3, posted devices screenshot and syslog. disk2 shows up as bad. First time I tried to start the array disk1 showed up as unformatted. Last time I tried to start the array disk1 and disk5 showed up as unformatted. Thanks in advance for any help or assistance with where to look for solutions. syslog-2011-12-03.txt
Joe L. Posted December 3, 2011 Posted December 3, 2011 I think a powerdown and reboot is in order. The syslog does not show how the drives are being initially identified. Whatever you do, do NOT press the "Format" button. It would only complicate your issues.
DosSpider Posted December 3, 2011 Author Posted December 3, 2011 Powering down now, syslog when it's back up? I've read enough of the forums, and screwed up with that format button early on in my UnRaid experience, to pretty much assume that unless I'm pretty much 100% sure I want to format something I don't want to hit that button.
Joe L. Posted December 3, 2011 Posted December 3, 2011 Powering down now, syslog when it's back up? Yes. unless I'm pretty much 100% sure I want to format something I don't want to hit that button. You are very wise.
DosSpider Posted December 3, 2011 Author Posted December 3, 2011 New Syslog. Array now stopped, showing disk1 and disk2 as bad. syslog-2011-12-03.txt
DosSpider Posted December 3, 2011 Author Posted December 3, 2011 Actually disk1 is showing as missing...I check the device page and it shows up as the correct drive (SAMSUNG_HD154UI_S1Y6J1LS722478), but on the unraid main page it shows that drive listed as the parity disk.
Joe L. Posted December 4, 2011 Posted December 4, 2011 Actually disk1 is showing as missing...I check the device page and it shows up as the correct drive (SAMSUNG_HD154UI_S1Y6J1LS722478), but on the unraid main page it shows that drive listed as the parity disk. and the syslog says it is the wrong disk.
DosSpider Posted December 4, 2011 Author Posted December 4, 2011 Hrm. Tried rebooting again, now I'm back to disk1 showing the correct disk, but showing unformatted, and disk2 showing up as a bad disk and showing up as unformatted. Not really sure how to figure out what I need to do. Is there some type of utility to check disk1? I don't really think it's a good idea to just replace disk2 since right now disk1 is showing unformatted and there should be information on it... Newest syslog... syslog-2011-12-03.txt
DosSpider Posted December 4, 2011 Author Posted December 4, 2011 doing some more poking around on the forums and I see where you've helped a couple people with various issues that might be related in some way to mine Joe.... Looks like maybe the MBR of that first disk has gotten screwed up somehow? Not sure I want to go mucking about with trying to fix that until I've got a better idea of what's going wrong. Tried running SMART for disks 1 & 2, as well as a file system check from unmenu....nothing very informative from either, but here are the reports. disk1-fsck.txt disk2-fsck.txt disk1-SMART-short.txt disk2-SMART-short.txt
fitbrit Posted December 4, 2011 Posted December 4, 2011 I'm in a very similar sounding situation right now. Two drives showed errors and I didn't see this until last night. I decided to replace the drive with the more errors first today, but suddenly a third drive completely red-balled. After a reboot I have three drives supposedly unformatted. I've possibly lost 5 TB of media.
lionelhutz Posted December 4, 2011 Posted December 4, 2011 This is what I see in the log. All disks are recognized and reiserfs' are found all the disks as you can see by the found reiserfs as listed below. Dec 3 19:44:09 FileServer kernel: REISERFS (device md4): found reiserfs format "3.6" with standard journal Dec 3 19:44:09 FileServer kernel: REISERFS (device md4): using ordered data mode Dec 3 19:44:09 FileServer kernel: REISERFS (device md4): journal params: device md4, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30 Dec 3 19:44:09 FileServer kernel: REISERFS (device md4): checking transaction log (md4) Dec 3 19:44:09 FileServer kernel: REISERFS (device md5): found reiserfs format "3.6" with standard journal Dec 3 19:44:09 FileServer kernel: REISERFS (device md5): using ordered data mode Dec 3 19:44:09 FileServer kernel: REISERFS (device md3): found reiserfs format "3.6" with standard journal Dec 3 19:44:09 FileServer kernel: REISERFS (device md3): using ordered data mode Dec 3 19:44:09 FileServer kernel: REISERFS (device md2): found reiserfs format "3.6" with standard journal Dec 3 19:44:09 FileServer kernel: REISERFS (device md2): using ordered data mode Dec 3 19:44:09 FileServer kernel: REISERFS (device md1): found reiserfs format "3.6" with standard journal Dec 3 19:44:09 FileServer kernel: REISERFS (device md1): using ordered data mode Disk1 starts to throw a bunch of link errors like this with the link being reset multiple times. Dec 3 19:44:41 FileServer kernel: ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x1800000 action 0x6 Dec 3 19:44:41 FileServer kernel: ata4.00: BMDMA stat 0x5 Dec 3 19:44:41 FileServer kernel: ata4: SError: { LinkSeq TrStaTrns } Dec 3 19:44:41 FileServer kernel: ata4.00: failed command: WRITE DMA Dec 3 19:44:41 FileServer kernel: ata4.00: cmd ca/00:08:bf:00:00/00:00:00:00:00/e0 tag 0 dma 4096 out Dec 3 19:44:41 FileServer kernel: res 51/84:08:bf:00:00/00:00:00:00:00/e0 Emask 0x10 (ATA bus error) Dec 3 19:44:41 FileServer kernel: ata4.00: status: { DRDY ERR } Dec 3 19:44:41 FileServer kernel: ata4.00: error: { ICRC ABRT } Dec 3 19:44:41 FileServer kernel: ata4: hard resetting link Dec 3 19:44:41 FileServer kernel: ata4: nv: skipping hardreset on occupied port Dec 3 19:44:41 FileServer kernel: ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Dec 3 19:44:41 FileServer kernel: ata4.00: configured for UDMA/100 Dec 3 19:44:41 FileServer kernel: ata4: EH complete The attempts to reset the connection to disk1 are finally abandoned. Dec 3 19:46:56 FileServer kernel: ata4: reset failed, giving up This messes up unRAID and causes the disk1 to be knocked-out of the array; Dec 3 19:45:25 FileServer emhttp: disk1 mount error: 32 Dec 3 19:45:25 FileServer emhttp: shcmd (22): rmdir /mnt/disk1 Dec 3 19:46:56 FileServer kernel: md: disk1 read error Dec 3 19:46:56 FileServer kernel: handle_stripe read error: 50168/1, count: 1 Dec 3 19:46:56 FileServer kernel: REISERFS warning (device md2): journal-1212 journal_read_transaction: REPLAY FAILURE fsck required! buffer write failed Dec 3 19:46:56 FileServer kernel: REISERFS warning (device md2): reiserfs-2006 journal_init: Replay Failure, unable to mount Dec 3 19:46:56 FileServer kernel: REISERFS warning (device md2): sh-2022 reiserfs_fill_super: unable to initialize journal space Dec 3 19:46:56 FileServer kernel: sd 4:0:0:0: [sde] READ CAPACITY(16) failed Dec 3 19:46:56 FileServer kernel: sd 4:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00 Dec 3 19:46:56 FileServer kernel: sd 4:0:0:0: [sde] Sense not available. Dec 3 19:46:56 FileServer kernel: sd 4:0:0:0: [sde] READ CAPACITY failed Dec 3 19:46:56 FileServer kernel: sd 4:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00 Dec 3 19:46:56 FileServer kernel: sd 4:0:0:0: [sde] Sense not available. Dec 3 19:46:56 FileServer kernel: sd 4:0:0:0: [sde] Asking for cache data failed Dec 3 19:46:56 FileServer kernel: sd 4:0:0:0: [sde] Assuming drive cache: write through Dec 3 19:46:56 FileServer kernel: sde: detected capacity change from 1500301910016 to 0 My guess is that you first need to try replacing the disk cable and power supply connectors on disk1. If that fails, then you need to move disk1 to a new SATA port. Oddly enough, I don't see anything about disk2 after the attempt to mount it. My guess is that disk1 is stopping disk2 from working. I suspect you're running a SATA card that emulates 2 IDE controllers with 2 disks per controller. With IDE controllers, 1 bad disk on the controller can knock-out both of the disks. Knowing what SATA card you're using could help. Peter
DosSpider Posted December 4, 2011 Author Posted December 4, 2011 After doing ever more reading on the forums last night I decided it might be power related (I don't have any experience to speak of reading or interpreting smart logs). I powered down the box and went to check the cabling and power. I feel fairly certain that 3 of the 4 SATA drives are good, they're using nice locking connectors, I realize that doesn't ensure the cable is still good, but one disk had a connector that seemed a little loose. I went ahead and pulled that cable and used a different cable on that. Also doublechecked all the power connectors and pulled the case fans off of a splitter running from the two IDE drives and put the fans on their own cable running from the PS just in case. Upon reboot it looks like everything is back to where I was when I first started. That is I have greenballs on 0,1,3,4,5 and 2 is redballed. I just ran short SMART test on each of the drives. I'll post syslog and SMART reports. Would appreciate any input. Thanks in advance.. syslog-2011-12-04.txt disk0-smart-short-20111204.txt disk1-smart-short-20111204.txt disk2-smart-short-20111204.txt
DosSpider Posted December 4, 2011 Author Posted December 4, 2011 And rest of the smart... disk3-smart-short-20111204.txt disk4-smart-short-20111204.txt disk5-smart-short-20111204.txt
DosSpider Posted December 4, 2011 Author Posted December 4, 2011 Oh and missed it earlier, but the board is an ASUS A8N5X. I'm using all 4 of the onboard SATA ports and 2 IDE drives. Changing the cable and double checking all of the power seems to at least have the array back to where it was. Based on the syslog and smart reports does it seem like it's OK to replace the drive that's red and rebuild?
lionelhutz Posted December 5, 2011 Posted December 5, 2011 Do a reiserfsck on disk2. If you follow the wiki you can use md2, otherwise if you stop the array you have to use sdb1. Don't forget the 1 after the sdb. I believe you need to stop the array, unassign disk2, start the array, stop the array, assign disk2 and then start the array. This will rebuild the simulated disk2 data back onto the existing disk2 which appears to be OK. When disk2 is unassigned you should check that all the files are being simulated properly from it. Oddly enough, the syslog is saying that disk2 is recognized and mounted. If you didn't want to rebuild disk2 or were worried that the simulated data is corrupt then you could probably just initialize the array and rebuild parity. You would lose anything that was written to disk2 while it was being simulated. Peter
Joe L. Posted December 5, 2011 Posted December 5, 2011 Do a reiserfsck on disk2. If you follow the wiki you can use md2, otherwise if you stop the array you have to use sdb1. Don't forget the 1 after the sdb. I believe you need to stop the array, unassign disk2, start the array, stop the array, assign disk2 and then start the array. This will rebuild the simulated disk2 data back onto the existing disk2 which appears to be OK. When disk2 is unassigned you should check that all the files are being simulated properly from it. Oddly enough, the syslog is saying that disk2 is recognized and mounted. If you didn't want to rebuild disk2 or were worried that the simulated data is corrupt then you could probably just initialize the array and rebuild parity. You would lose anything that was written to disk2 while it was being simulated. Peter If you do the file-system repair on the raw (/dev/sdX1 ) disk partition rather than on /dev/md2 then parity will NOT be in sync and you will need to do a correcting parity check to fix it, otherwise parity will show errors when next run.
lionelhutz Posted December 5, 2011 Posted December 5, 2011 If you do the file-system repair on the raw (/dev/sdX1 ) disk partition rather than on /dev/md2 then parity will NOT be in sync and you will need to do a correcting parity check to fix it, otherwise parity will show errors when next run. Yes, but only if you do repairs, not when you do a check. Peter
Recommended Posts
Archived
This topic is now archived and is closed to further replies.