JustinAiken Posted August 27, 2010 Share Posted August 27, 2010 One of my drives disappeared the other day... the array stopped, and the disc turned red, saying that it wasn't found. I tried turning off the computer and reinserting it, still not found... Tried moving it to another spot too, still didn't see it. I ordered a new 2TB to replace it... ran the preclear, assigned it to the old discs place in the array, started the array to rebuild it overnight. Woke up this morning, it says that the drive is unformatted, and in the management area below, it says "Format will create a file system in all Unformatted disks, discarding all data currently on those disks." Is it still possible to save the drive? Why didn't the rebuild work? Link to comment
Rajahal Posted August 27, 2010 Share Posted August 27, 2010 What version of unRAID are you running? Link to comment
Rajahal Posted August 27, 2010 Share Posted August 27, 2010 When you were moving the disk around, did you re-assign it on the 'devices' page? unRAID won't assign it automatically, you have to tell it which disk slot you want it to use. As long as it is the same size or larger than your failed disk, unRAID should rebuild all the data onto it. Once you assign it to the slot of the failed disk, unRAID should say something like 'replacing disk/upgrading disk'. You can then start the array and the data rebuild should begin. Link to comment
JustinAiken Posted August 27, 2010 Author Share Posted August 27, 2010 Yes, that's exactly what I did... but when it was done rebuilding it, and started the array, it tells me that it's unformatted Link to comment
Rajahal Posted August 27, 2010 Share Posted August 27, 2010 Hmm, well, that is definitely a problem. I don't know what to do at this point, so we'll have to wait for someone more experienced to chime in. Post a syslog? Link to comment
JustinAiken Posted August 27, 2010 Author Share Posted August 27, 2010 http://pastebin.com/S0uQw5E1 Link to comment
graywolf Posted August 27, 2010 Share Posted August 27, 2010 See a couple things that look like issues. Not totally sure what they mean nor best method to recover at this point: Aug 27 14:25:27 Tower kernel: REISERFS error (device md2): vs-5150 search_by_key: invalid format found in block 234818035. Fsck? Aug 27 14:25:27 Tower kernel: REISERFS (device md2): Remounting filesystem read-only Aug 27 14:25:27 Tower kernel: REISERFS error (device md2): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1 2 0x0 SD] Aug 27 14:25:27 Tower kernel: REISERFS (device md2): Using r5 hash to sort names Aug 27 14:25:27 Tower kernel: REISERFS (device md5): replayed 2 transactions in 3 seconds Aug 27 14:25:27 Tower kernel: REISERFS (device md4): Using r5 hash to sort names Aug 27 14:25:27 Tower kernel: REISERFS (device md3): Using r5 hash to sort names Aug 27 14:25:27 Tower kernel: REISERFS (device md1): Using r5 hash to sort names Aug 27 14:25:27 Tower kernel: REISERFS (device md6): Using r5 hash to sort names Aug 27 14:25:27 Tower kernel: REISERFS (device md5): Using r5 hash to sort names Aug 27 14:28:05 Tower kernel: REISERFS error (device md5): reiserfs-2025 reiserfs_cache_bitmap_metadata: bitmap block 366280704 is corrupted: first bit must be 1 Aug 27 14:28:05 Tower kernel: REISERFS (device md5): Remounting filesystem read-only So you have at least 2 drives in read-only. Didn't fully search logs, could very well be even more. Link to comment
JustinAiken Posted August 27, 2010 Author Share Posted August 27, 2010 The second drive with issues is probably the old drive that died... it's still plugged in a slot (but not a part of the array) Link to comment
GK20 Posted August 28, 2010 Share Posted August 28, 2010 The second drive with issues is probably the old drive that died... it's still plugged in a slot (but not a part of the array) If it is not part of unRAID, it wouldn't associate with a md disk. disk was reported "unformatted" simply because unRAID can not mount it. So, DO NOT format any disk unless you are sure what you are doing. Given that there is no low level SATA errors associated with those errors from file system, at this moment, i will suggest (a) Run SMART tests on both disks (b) Reboot your machine, if it doesn't help, then run memtest to very memoery. keep in mind unRAID is using RAM file system © If both steps don't yield good result, then contact Limetech. One thing i don't not quite understand is unRAID had given up disk2 right after kickoff data reconstruction on disk2 due to file system error, how can unRAID still report data reconstruction finished without any problem later? ----------------------------------------------------------------------------------------------------------------------------------------------------------- Aug 26 23:27:54 Tower kernel: mdcmd (17): start RECON_DISK Aug 26 23:27:54 Tower kernel: unraid: allocating 43960K for 1280 stripes (8 disks) Aug 26 23:27:55 Tower kernel: md: recovery thread woken up ... Aug 26 23:27:55 Tower kernel: md: recovery thread rebuilding disk2 ... Aug 26 23:27:55 Tower kernel: md: using 1152k window, over a total of 1953514552 blocks. Aug 26 23:27:55 Tower kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 23026 does not match to the expected one 4 Aug 26 23:27:55 Tower kernel: REISERFS error (device md2): vs-5150 search_by_key: invalid format found in block 234818035. Fsck? Aug 26 23:27:55 Tower kernel: REISERFS (device md2): Remounting filesystem read-only Aug 26 23:27:55 Tower kernel: REISERFS error (device md2): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1 2 0x0 SD] Aug 26 23:27:55 Tower kernel: REISERFS (device md2): Using r5 hash to sort names Aug 26 23:27:55 Tower logger: mount: /dev/md2: can't read superblock Aug 26 23:27:55 Tower emhttp: _shcmd: shcmd (131): exit status: 32 Aug 26 23:27:55 Tower emhttp: disk2 mount error: 32 Aug 26 23:27:55 Tower emhttp: shcmd (132): rmdir /mnt/disk2 Aug 26 23:34:45 Tower kernel: REISERFS error (device md5): reiserfs-2025 reiserfs_cache_bitmap_metadata: bitmap block 366280704 is corrupted: first bit must be 1 Aug 26 23:34:45 Tower kernel: REISERFS (device md5): Remounting filesystem read-only Aug 27 09:10:17 Tower kernel: md: sync done. time=34941sec rate=55908K/sec Aug 27 09:10:17 Tower kernel: md: recovery thread sync completion status: 0 Link to comment
JustinAiken Posted August 28, 2010 Author Share Posted August 28, 2010 Oh dear, this looks bad... I ran a memtest for about 36 hours a little over a week ago, 0 errors there... Just turned the machine on (it's been off since I discovered the problem), same issue... here's the pastebin of that syslog in case it can help: http://pastebin.com/gU2yRPus And here's the SMART report for each of the questionable drives: Drive 2: http://pastebin.com/jM8EAjLG Drive 5: http://pastebin.com/ayPfwKGy The web interface and telnet works, but SAMBA is not working at all... Link to comment
JustinAiken Posted August 28, 2010 Author Share Posted August 28, 2010 ACtually, SAMBA is working, just not on my Mac... my linux xbmc can see use it still.. Looks like disc2 isn't being emulated... Link to comment
GK20 Posted August 28, 2010 Share Posted August 28, 2010 ACtually, SAMBA is working, just not on my Mac... my linux xbmc can see use it still.. Looks like disc2 isn't being emulated... Pretty much it is, you can telnet to your machine then (a) "cd /mnt" (b) "ls -l" © There should be NO disk2 entry there (d) You should be able to see data under disk5 because this disk now is mounted read-only. at this moment. i think the options is limited. (a) You can try to fix disk5 by referring to this discussion. http://lime-technology.com/forum/index.php?topic=4866.0 (b) once you fix disk5, For disk2, try different SATA port and cable see if it helps. if not get another disk and re-start data reconstruction. © You have HPA turn on (using a Gigabyte MB?), search forum to find solution for fixing this issue. HPA is an issue in unRAID environment. Aug 28 11:49:56 Tower kernel: ata4.00: HPA detected: current 1465147055, native 1465149168 Aug 28 11:49:56 Tower kernel: ata4.00: ATA-8: WDC WD7500AACS-00D6B1, 01.01A01, max UDMA/133 Aug 28 11:49:56 Tower kernel: ata4.00: 1465147055 sectors, multi 0: LBA48 NCQ (depth 31/32) Aug 28 11:49:56 Tower kernel: ata4.00: configured for UDMA/133 Aug 28 11:49:56 Tower kernel: scsi 0:0:3:0: Direct-Access ATA WDC WD7500AACS-0 01.0 PQ: 0 ANSI: 5 Aug 28 11:49:56 Tower kernel: [850]: scst_suspend_activity:599:suspend_count 0 Aug 28 11:49:56 Tower kernel: sd 0:0:3:0: [sde] 1465147055 512-byte logical blocks: (750 GB/698 GiB) Aug 28 11:49:56 Tower kernel: sd 0:0:3:0: [sde] Write Protect is off Aug 28 11:49:56 Tower kernel: sd 0:0:3:0: [sde] Mode Sense: 00 3a 00 00 Aug 28 11:49:56 Tower kernel: sd 0:0:3:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Aug 28 11:49:56 Tower kernel: sde: Link to comment
JustinAiken Posted August 28, 2010 Author Share Posted August 28, 2010 Looks like disc2 isn't being emulated... Pretty much it is Shouldn't the unRaid array emulate the disc over the shares by reading the parity/rest of the drives? Fixing disc5 now, then will move on to disk2... Glad you caught that disc7 has HPA... I didn't know that!! I got a non-Gigabyte motherboard specifically to avoid this kind of thing, but I guess it was on from when it was in my desktop.. That would explain why no data was ever written to the drive Link to comment
GK20 Posted August 28, 2010 Share Posted August 28, 2010 Looks like disc2 isn't being emulated... Pretty much it is Shouldn't the unRaid array emulate the disc over the shares by reading the parity/rest of the drives? Fixing disc5 now, then will move on to disk2... Glad you caught that disc7 has HPA... I didn't know that!! I got a non-Gigabyte motherboard specifically to avoid this kind of thing, but I guess it was on from when it was in my desktop.. That would explain why no data was ever written to the drive disk2 is emulating by unRAID now, what i meant is this disk is missing from unRAID. that is why you shouldn't see a /mnt/disk2 entry. however given that your disk5 is read-only and there is also HPA issue. i wouldn't hold this "emulation" too long, because you are one step short of having "double failure". Link to comment
JustinAiken Posted August 28, 2010 Author Share Posted August 28, 2010 Yeah, I'm not writing to the discs... just was going to backup some files from disc2 in case anything happens... but all the files from disc2 aren't emulated over the shares.. They're just gone.. Link to comment
GK20 Posted August 28, 2010 Share Posted August 28, 2010 Yeah, I'm not writing to the discs... just was going to backup some files from disc2 in case anything happens... but all the files from disc2 aren't emulated over the shares.. They're just gone.. Fix disk5 and HPA first and if disk2 is still not showing up through shares, physically disconnected it from system. by doing so it should give unRAID an impression this disk is gone and in theory unRAID should start to emulate this disk. Link to comment
JustinAiken Posted August 28, 2010 Author Share Posted August 28, 2010 Okay, good idea! Almost done with the tree rebuild on 5... Link to comment
JustinAiken Posted August 28, 2010 Author Share Posted August 28, 2010 --rebuild-tree took a long time, and found many errors to fix, but it's all good now... I removed the HPA for one of the drives (actually I have two drives with HPA on them... didn't know that until now)... Reboot the machine, the syslog tells me that I still have HPA on that other drive (it was there in the syslog I posted earlier, we both just missed it), and now there is no red lines in the syslog (not even about disk2, which is the disc that started this mess! Maybe it will when I start the array...) Problem is, now it thinks that I want to upgrade the drive I just killed the HPA on. I'd be too scared to try and rebuild a drive with the delicate state my array is in, but this drive was empty before all of this, so it should be no biggie... Looks like I'd better wait to fix the HPA on the other disc until all these errors and stuff is sorted out.. EDIT - As soon as I started the array so that the HPA drive can get itself sorted out, the syslog did indeed contain red lines about disc2. - http://pastebin.com/fVrHx17V Link to comment
JustinAiken Posted August 29, 2010 Author Share Posted August 29, 2010 After the HPA drive was rebuilt, I removed disc2 on the the devices page and started the array... It should definitely be emulating it now, but all the files I had on disc2 aren't showing up in the user shares, and disc2 isn't a samba option... Link to comment
GK20 Posted August 29, 2010 Share Posted August 29, 2010 After the HPA drive was rebuilt, I removed disc2 on the the devices page and started the array... It should definitely be emulating it now, but all the files I had on disc2 aren't showing up in the user shares, and disc2 isn't a samba option... If you have only disk2 now is in trouble, unRAID should be able to emulate it for you. Do you have syslog? Link to comment
JustinAiken Posted August 29, 2010 Author Share Posted August 29, 2010 Yeah, the only error messages I see are about disc 2: http://pastebin.com/HZXERGb5 There's still HPA on one of my drives, but I can't safely remove it until I get the array sorted out... otherwise unRaid will rebuild the HPA'd drive after I unHPA it, and with drive2 being odd, that could be a disaster! Link to comment
Joe L. Posted August 30, 2010 Share Posted August 30, 2010 Yeah, the only error messages I see are about disc 2: http://pastebin.com/HZXERGb5 There's still HPA on one of my drives, but I can't safely remove it until I get the array sorted out... otherwise unRaid will rebuild the HPA'd drive after I unHPA it, and with drive2 being odd, that could be a disaster! If you remove the HPA unRIAD will think it is a completely different disk and effectively you will have created a two disk failure. unRAID matches the model/serial number AND the size when determining if a disk is what was expected in the array. Don't mess with the HPA until after you get disk2 fixed/replaced. Link to comment
GK20 Posted August 30, 2010 Share Posted August 30, 2010 Yeah, the only error messages I see are about disc 2: http://pastebin.com/HZXERGb5 There's still HPA on one of my drives, but I can't safely remove it until I get the array sorted out... otherwise unRaid will rebuild the HPA'd drive after I unHPA it, and with drive2 being odd, that could be a disaster! Comparing this syslog with earlier version, i am afraid the parity you have might not be trustworthy since very beginning. because this latest syslog shown your disk2 is gone and whatever I/O to this disk should be emulated by unRAID, let it be regular data I/O or even super block I/O. however when unRAID try to mount this disk2 in emulated way the superblock data it reconstructed don't pass some internal sanity checking hence unRAID can not mount this disk even in emulated way. and as long as this disk can not be mounted you will not be able to see all data in this disk. Unless there is a solution to fix this situation otherwise, i can not think of a way to reconstruct data on disk2 even we throw in another new disk. ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Aug 29 13:49:36 Tower kernel: md: disk2 removed Aug 29 13:49:37 Tower kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 23026 does not match to the expected one 4 Aug 29 13:49:37 Tower kernel: REISERFS error (device md2): vs-5150 search_by_key: invalid format found in block 234818035. Fsck? Aug 29 13:49:37 Tower kernel: REISERFS (device md2): Remounting filesystem read-only Aug 29 13:49:37 Tower kernel: REISERFS error (device md2): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1 2 0x0 SD] ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Aug 28 11:49:58 Tower logger: mount: /dev/md2: can't read superblock Aug 28 11:49:58 Tower kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 23026 does not match to the expected one 4 Aug 28 11:49:58 Tower emhttp: _shcmd: shcmd (21): exit status: 32 Aug 28 11:49:58 Tower kernel: REISERFS error (device md2): vs-5150 search_by_key: invalid format found in block 234818035. Fsck? Aug 28 11:49:58 Tower emhttp: disk2 mount error: 32 Aug 28 11:49:58 Tower kernel: REISERFS (device md2): Remounting filesystem read-only Aug 28 11:49:58 Tower kernel: REISERFS error (device md2): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1 2 0x0 SD] Link to comment
Joe L. Posted August 30, 2010 Share Posted August 30, 2010 Yeah, the only error messages I see are about disc 2: http://pastebin.com/HZXERGb5 There's still HPA on one of my drives, but I can't safely remove it until I get the array sorted out... otherwise unRaid will rebuild the HPA'd drive after I unHPA it, and with drive2 being odd, that could be a disaster! Comparing this syslog with earlier version, i am afraid the parity you have might not be trustworthy since very beginning. because this latest syslog shown your disk2 is gone and whatever I/O to this disk should be emulated by unRAID, let it be regular data I/O or even super block I/O. however when unRAID try to mount this disk2 in emulated way the superblock data it reconstructed don't pass some internal sanity checking hence unRAID can not mount this disk even in emulated way. and as long as this disk can not be mounted you will not be able to see all data in this disk. Unless there is a solution to fix this situation otherwise, i can not think of a way to reconstruct data on disk2 even we throw in another new disk. ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Aug 29 13:49:36 Tower kernel: md: disk2 removed Aug 29 13:49:37 Tower kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 23026 does not match to the expected one 4 Aug 29 13:49:37 Tower kernel: REISERFS error (device md2): vs-5150 search_by_key: invalid format found in block 234818035. Fsck? Aug 29 13:49:37 Tower kernel: REISERFS (device md2): Remounting filesystem read-only Aug 29 13:49:37 Tower kernel: REISERFS error (device md2): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1 2 0x0 SD] ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Aug 28 11:49:58 Tower logger: mount: /dev/md2: can't read superblock Aug 28 11:49:58 Tower kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 23026 does not match to the expected one 4 Aug 28 11:49:58 Tower emhttp: _shcmd: shcmd (21): exit status: 32 Aug 28 11:49:58 Tower kernel: REISERFS error (device md2): vs-5150 search_by_key: invalid format found in block 234818035. Fsck? Aug 28 11:49:58 Tower emhttp: disk2 mount error: 32 Aug 28 11:49:58 Tower kernel: REISERFS (device md2): Remounting filesystem read-only Aug 28 11:49:58 Tower kernel: REISERFS error (device md2): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1 2 0x0 SD] Even though the drive is emulated you can still run reiserfsck on it (It will fix the emulated drive) Then, when you put a good drive into place, it should rebuild onto it. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.