dwoods99 Posted August 21, 2013 Share Posted August 21, 2013 I replaced a 2TB WD green drive with a 3TB version and while doing a rebuild of the data, it seems to hang after 70%. The web interface does not want to respond and all I can see with ps -ef are processes like hdparm -C /dev/sdl that appear to be hung or defunct root 5112 2190 0 06:40 ? 00:00:00 /usr/sbin/hdparm -C /dev/sdl root 5113 2190 0 06:40 ? 00:00:00 [hdparm] <defunct> root 5114 2190 0 06:40 ? 00:00:00 [hdparm] <defunct> root 5115 2190 0 06:40 ? 00:00:00 [hdparm] <defunct> root 5123 2193 0 06:41 ? 00:00:00 /bin/bash ./s3.sh root 5124 5123 0 06:41 ? 00:00:00 /bin/bash ./s3.sh root 5125 5124 0 06:41 ? 00:00:00 /bin/bash ./s3.sh root 5126 5124 0 06:41 ? 00:00:00 wc -l root 5150 5125 0 06:41 ? 00:00:00 hdparm -C /dev/sdl I can't stop the array and I can't powerdown even from a telnet shell. I've tried hard reboots but the problem persists. Any ideas? I could not spot anything in the syslog Aug 18 15:45:25 Moat emhttp: ST3000DM001-1CH166_####HVN (sda) 2930266584 Aug 18 15:45:25 Moat emhttp: WDC_WD30EZRX-00DC0B0_WD-####6740 (sdb) 2930266584 Aug 18 15:45:25 Moat emhttp: WDC_WD30EZRX-00DC0B0_WD-####3952 (sdc) 2930266584 Aug 18 15:45:25 Moat emhttp: WDC_WD20EARS-00MVWB0_WD-####7613 (sdd) 1953514584 Aug 18 15:45:25 Moat emhttp: WDC_WD20EARS-00MVWB0_WD-####3254 (sdf) 1953514584 Aug 18 15:45:25 Moat emhttp: WDC_WD20EFRX-68AX9N0_WD-####0521 (sdg) 1953514584 Aug 18 15:45:25 Moat emhttp: WDC_WD20EARS-00MVWB0_WD-####1569 (sdh) 1953514584 Aug 18 15:45:25 Moat emhttp: WDC_WD20EARS-00MVWB0_WD-####9546 (sdi) 1953514584 Aug 18 15:45:25 Moat emhttp: WDC_WD20EARS-00S8B1_WD-####7510 (sdj) 1953514584 Aug 18 15:45:25 Moat emhttp: WDC_WD20EARX-00PASB0_WD-####7189 (sdk) 1953514584 Aug 18 15:45:25 Moat emhttp: WDC_WD20EARS-00MVWB0_WD-####5888 (sdl) 1953514584 Aug 18 15:45:25 Moat emhttp: WDC_WD30EZRX-00DC0B0_WD-####3235 (sdm) 2930266584 Aug 18 15:45:25 Moat kernel: mdcmd (1): import 0 8,0 2930266532 ST3000DM001-1CH166_####QHVN Aug 18 15:45:25 Moat kernel: md: import disk0: [8,0] (sda) ST3000DM001-1CH166_####QHVN size: 2930266532 Aug 18 15:45:25 Moat kernel: mdcmd (2): import 1 8,96 1953514552 WDC_WD20EFRX-68AX9N0_WD-####0521 Aug 18 15:45:25 Moat kernel: md: import disk1: [8,96] (sdg) WDC_WD20EFRX-68AX9N0_WD-####0521 size: 1953514552 Aug 18 15:45:25 Moat kernel: mdcmd (3): import 2 8,112 1953514552 WDC_WD20EARS-00MVWB0_WD-####1569 Aug 18 15:45:25 Moat kernel: md: import disk2: [8,112] (sdh) WDC_WD20EARS-00MVWB0_WD-####1569 size: 1953514552 Aug 18 15:45:25 Moat kernel: mdcmd (4): import 3 8,80 1953514552 WDC_WD20EARS-00MVWB0_WD-####3254 Aug 18 15:45:25 Moat kernel: md: import disk3: [8,80] (sdf) WDC_WD20EARS-00MVWB0_WD-####3254 size: 1953514552 Aug 18 15:45:25 Moat kernel: mdcmd (5): import 4 8,128 1953514552 WDC_WD20EARS-00MVWB0_WD-####9546 Aug 18 15:45:25 Moat kernel: md: import disk4: [8,128] (sdi) WDC_WD20EARS-00MVWB0_WD-####9546 size: 1953514552 Aug 18 15:45:25 Moat kernel: mdcmd (6): import 5 8,144 1953514552 WDC_WD20EARS-00S8B1_WD-####7510 Aug 18 15:45:25 Moat kernel: md: import disk5: [8,144] (sdj) WDC_WD20EARS-00S8B1_WD-####7510 size: 1953514552 Aug 18 15:45:25 Moat kernel: mdcmd (7): import 6 8,48 1953514552 WDC_WD20EARS-00MVWB0_WD-####7613 Aug 18 15:45:25 Moat kernel: md: import disk6: [8,48] (sdd) WDC_WD20EARS-00MVWB0_WD-####7613 size: 1953514552 Aug 18 15:45:25 Moat kernel: mdcmd (: import 7 8,32 2930266532 WDC_WD30EZRX-00DC0B0_WD-####3952 Aug 18 15:45:25 Moat kernel: md: import disk7: [8,32] (sdc) WDC_WD30EZRX-00DC0B0_WD-####3952 size: 2930266532 Aug 18 15:45:25 Moat kernel: mdcmd (9): import 8 8,192 2930266532 WDC_WD30EZRX-00DC0B0_WD-####3235 Aug 18 15:45:25 Moat kernel: md: import disk8: [8,192] (sdm) WDC_WD30EZRX-00DC0B0_WD-####3235 size: 2930266532 Aug 18 15:45:25 Moat kernel: mdcmd (10): import 9 8,176 1953514552 WDC_WD20EARS-00MVWB0_WD-####5888 Aug 18 15:45:25 Moat kernel: md: import disk9: [8,176] (sdl) WDC_WD20EARS-00MVWB0_WD-####5888 size: 1953514552 Aug 18 15:45:25 Moat kernel: mdcmd (11): import 10 8,16 2930266532 WDC_WD30EZRX-00DC0B0_WD-####6740 Aug 18 15:45:25 Moat kernel: md: import disk10: [8,16] (sdb) WDC_WD30EZRX-00DC0B0_WD-####6740 size: 2930266532 Aug 18 15:45:25 Moat kernel: mdcmd (12): import 11 8,160 1953514552 WDC_WD20EARX-00PASB0_WD-####7189 Aug 18 15:45:25 Moat kernel: md: import disk11: [8,160] (sdk) WDC_WD20EARX-00PASB0_WD-####7189 size: 1953514552 Aug 18 15:45:03 Moat kernel: sd 1:0:4:0: [sdj] Attached SCSI disk Aug 18 15:45:03 Moat kernel: sd 1:0:5:0: [sdk] Attached SCSI disk Aug 18 15:45:03 Moat logger: /etc/rc.d/rc.inet1: /sbin/ifconfig lo 127.0.0.1 Aug 18 15:45:03 Moat logger: /etc/rc.d/rc.inet1: /sbin/route add -net 127.0.0.0 netmask 255.0.0.0 lo Aug 18 15:45:03 Moat logger: /etc/rc.d/rc.inet1: /sbin/ifconfig eth0 192.168.1.55 broadcast 192.168.1.255 netmask 255.255.255.0 Aug 18 15:45:03 Moat kernel: r8168: eth0: link down Aug 18 15:45:03 Moat logger: /etc/rc.d/rc.inet1: /sbin/route add default gw 192.168.1.1 metric 1 Aug 18 15:45:03 Moat rpc.statd[1222]: Version 1.2.2 starting Aug 18 15:45:03 Moat sm-notify[1223]: Version 1.2.2 starting Aug 18 15:45:03 Moat rpc.statd[1222]: Failed to read /var/lib/nfs/state: Success Aug 18 15:45:03 Moat rpc.statd[1222]: Initializing NSM state Aug 18 15:45:03 Moat rpc.statd[1222]: Running as root. chown /var/lib/nfs to choose different user Aug 18 15:45:03 Moat ntpd[1238]: ntpd [email protected] Sat Apr 24 19:01:14 UTC 2010 (1) Aug 18 15:45:03 Moat ntpd[1239]: proto: precision = 0.260 usec Aug 18 15:45:03 Moat ntpd[1239]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16 Aug 18 15:45:03 Moat ntpd[1239]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123 Aug 18 15:45:03 Moat ntpd[1239]: Listen normally on 1 lo 127.0.0.1 UDP 123 Aug 18 15:45:03 Moat ntpd[1239]: Listen normally on 2 eth0 192.168.1.55 UDP 123 Aug 18 15:45:03 Moat acpid: starting up with proc fs Aug 18 15:45:03 Moat acpid: skipping conf file /etc/acpi/events/. Aug 18 15:45:03 Moat acpid: skipping conf file /etc/acpi/events/.. Aug 18 15:45:03 Moat acpid: 1 rule loaded Aug 18 15:45:03 Moat acpid: waiting for events: event logging is off Aug 18 15:45:03 Moat crond[1261]: /usr/sbin/crond 4.4 dillon's cron daemon, started with loglevel notice Aug 18 15:45:06 Moat kernel: r8168: eth0: link up Aug 18 15:45:06 Moat kernel: r8168: eth0: link up Aug 18 15:45:25 Moat logger: installing plugin: * Aug 18 15:45:25 Moat logger: Aug 18 15:45:25 Moat logger: Warning: simplexml_load_file(): I/O warning : failed to load external entity "/boot/config/plugins/ *.plg" in /usr/local/sbin/installplg on line 13 Aug 19 01:57:47 Moat kernel: sd 1:0:7:0: task abort: SUCCESS scmd(f0cac180) ... repeating ... Aug 19 13:43:59 Moat kernel: sd 1:0:7:0: attempting task abort! scmd(f0dea180) Aug 19 13:43:59 Moat kernel: sd 1:0:7:0: [sdm] CDB: cdb[0]=0x28: 28 00 f8 9f 44 d0 00 04 00 00 Aug 19 13:43:59 Moat kernel: scsi target1:0:7: handle(0x0010), sas_address(0x4433221105000000), phy(5) Aug 19 13:43:59 Moat kernel: scsi target1:0:7: enclosure_logical_id(0x500304800ee2af00), slot(5) Aug 19 13:43:59 Moat kernel: sd 1:0:7:0: task abort: SUCCESS scmd(f0dea180) Aug 19 13:44:30 Moat kernel: sd 1:0:7:0: attempting task abort! scmd(f0dea180) Aug 19 13:44:30 Moat kernel: sd 1:0:7:0: [sdm] CDB: cdb[0]=0x28: 28 00 f8 9f 44 d0 00 04 00 00 Aug 19 13:44:30 Moat kernel: scsi target1:0:7: handle(0x0010), sas_address(0x4433221105000000), phy(5) Aug 19 13:44:30 Moat kernel: scsi target1:0:7: enclosure_logical_id(0x500304800ee2af00), slot(5) Aug 19 13:44:30 Moat kernel: sd 1:0:7:0: task abort: SUCCESS scmd(f0dea180) Link to comment
RobJ Posted August 21, 2013 Share Posted August 21, 2013 70% is roughly at about 2.1 TB, are you sure your disk controller supports drives over 2TB? Check for a BIOS or firmware update, for board or card. Link to comment
BobPhoenix Posted August 21, 2013 Share Posted August 21, 2013 What he said - most likely. But I had a freeze when unRAID was formatting a precleared drive the other night. When I canceled the preclear that was currently running (PreRead step) the format immediately started up again and was finished within a few seconds to a minute. So long story short were you doing anything else while rebuilding the drive? Link to comment
dwoods99 Posted August 22, 2013 Author Share Posted August 22, 2013 The parity drive is 3TB Seagate and there were already 2 x 3TB WD Green drives in the mix. No, I was replacing a 2TB with a 3TB to provide more space in the array. I did not pre-clear it. Edit: I checked and 3TB drives are on motherboard as well as new drive but new one is not using blue sata cable into blue ports. Changed and trying rebuild again. Link to comment
dwoods99 Posted August 22, 2013 Author Share Posted August 22, 2013 As expected it made no difference. Stuck again with hung hdparm processes. I will try to move drive connection to the sata expansion boards. Link to comment
RobJ Posted August 23, 2013 Share Posted August 23, 2013 I would check a SMART report for the drive, make sure it's OK. Then get an hdparm report ("hdparm -I /dev/sdx"), and make sure it looks right for a 3TB drive. Feel free to post them for us to check. Then try Preclearing it, to make sure the entire drive can be accessed successfully. By the way, which drive is sdl? Can you zip the syslog and attach it? Link to comment
dwoods99 Posted August 24, 2013 Author Share Posted August 24, 2013 It seems that disk7 (3TB) checks out ok with smartctl but disk8 (3TB) failed -- these are both new drives this month. Problem is that the data rebuild is doing many writes on disk7 to rebuild and fix. I have a third 3TB drive, pre-cleared and ready to use. Does it make sense to swap out disk7 with the new one, let it rebuild and hopefully this time no more hanging. And then once completed, pre-clear the old disk7, and swap it with disk8 and rebuild again. Then I'll be able to send disk8 back to the store. Obviously I'm trying to avoid losing any data in disk7 or disk8. Is this a good approach? Link to comment
dgaschk Posted August 24, 2013 Share Posted August 24, 2013 If disk 8 has Failed SMART then it is the one that needs to be replaced. Link to comment
dwoods99 Posted August 24, 2013 Author Share Posted August 24, 2013 I understand that however since I see disk7 being the one trying to rebuild data, I am worried that removing disk8 and rebuilding it would cause data loss from disk7. Is that wrong? Link to comment
dwoods99 Posted August 25, 2013 Author Share Posted August 25, 2013 When I tried to replace disk8, I now get Too many wrong and/or missing disks! after selecting the new one. To me this means that my fears of losing data on disk7 *and* disk8 are valid. I don't think I can do anything else but replace disk7 first in order to rebuild the data, and then disk8. Open to other suggestions. Link to comment
dgaschk Posted August 25, 2013 Share Posted August 25, 2013 Replace the original disk 7 and set a new config. Assign all disks and indicate that parity is correct. Start the array then stop the array and replace disk8. Link to comment
RobJ Posted August 25, 2013 Share Posted August 25, 2013 I've hesitated to speak up, because there hasn't been enough info provided to be sure of the situation. I've been hoping to see a syslog and SMART reports for the involved drives, to know for sure what is good and what has issues. The advice from dgaschk sounds like the right plan, but with 2 drives involved, there's obviously a much higher chance of data loss, if the situation proves to be a little different than we can gather from limited info. You said "disk8 (3TB) failed", and that may be true, but respectfully we can't know for sure how knowledgeable you are about determining that without knowing you ourselves, and also a 'failed disk' can mean many things, from 'won't even spin up' to 'has an alarmingly high SMART attribute but still useable'. Knowing that could make a big difference. Also it would be good to make sure we know the true state of both the new Disk 7 and the old 2TB Disk 7. If you do have a bad Disk 8 and a perfectly good original Disk 7 (even if only 2 TB), then dgaschk's plan sounds correct. You want to restore the original working array, postponing the expansion of Disk 7, until the rest of the array is fine. What also confuses me a little is that the primary complaint in your original post was about the hung 'hdparm -C' processes, but they actually appear to apply to sdl, which is Disk 9! Is Disk 9 OK? I have no idea what the significance of those hung processes are, but it might be good to know for sure about the state of Disk 9 too. Under no circumstances do you want to Preclear the original Disk 7, until the new array is completely fine. Link to comment
dwoods99 Posted August 26, 2013 Author Share Posted August 26, 2013 Replace the original disk 7 and set a new config. Assign all disks and indicate that parity is correct. Start the array then stop the array and replace disk8. I have done this but I still got Too many wrong and/or missing disks!. I put disk 7 and 8 back the way it was, did 'new config' and now fixing parity. I am sure the contents of disk 7 was correct so I am taking the risk that I won't lose data -- no other choice anymore. @RobJ, It just seems that disk 8 was being spun up or down and hanging during rebuild of data on disk 7, causing the rebuild to fail. As for disk 8 'failed', I was referring to smartctl output when short test run from the simplefeatures web menu. I previously posted what was most likely the only relevant parts to the syslog that might indicate disk7/8 problems. I am no expert on smartctl but pretty knowledgeable with servers and unix. The original 2TB disk 7 has already been allocated and used in my second unRaid server. Thanks to all who have been helping me with this problem. Link to comment
dwoods99 Posted August 26, 2013 Author Share Posted August 26, 2013 No luck! Once again it gets to 71% 2.13TB and freezes -- web interface no longer wants to respond. Only thing left I can think of to do is get a new 2TB drive and replace disk 7 with that to force the rebuild or parity sync to end properly --- and then replace disk 8. Any other ideas? Link to comment
dgaschk Posted August 27, 2013 Share Posted August 27, 2013 Replace the original disk 7 and set a new config. Assign all disks and indicate that parity is correct. Start the array then stop the array and replace disk8. I have done this but I still got Too many wrong and/or missing disks!. I put disk 7 and 8 back the way it was, did 'new config' and now fixing parity. I am sure the contents of disk 7 was correct so I am taking the risk that I won't lose data -- no other choice anymore. @RobJ, It just seems that disk 8 was being spun up or down and hanging during rebuild of data on disk 7, causing the rebuild to fail. As for disk 8 'failed', I was referring to smartctl output when short test run from the simplefeatures web menu. I previously posted what was most likely the only relevant parts to the syslog that might indicate disk7/8 problems. I am no expert on smartctl but pretty knowledgeable with servers and unix. The original 2TB disk 7 has already been allocated and used in my second unRaid server. Thanks to all who have been helping me with this problem. After selecting New Config it's not possible to receive the message "Too many wrong and/or missing disks". This operation clears the disk assignments and you must reassign all of the disks correctly. Assign the original 2T disk 7 and the other disks including parity to their respective slots. Check the "Parity is correct" box and start the array. Now disk 8 can be replaced... Link to comment
dwoods99 Posted September 3, 2013 Author Share Posted September 3, 2013 Clicking "Parity is correct" still wouldn't work -- emhttp gets hungs at 71% 2.13TB size. I removed the 3TB disk7 drive and replaced with a 2TB blank drive -- wrong size for replacement disk. Next I forced a new config with the 2TB drive, rebuilt parity -- ok. Followed by replacing disk 8 with the new 3TB drive, force rebuilding of data and parity -- ok. System ok now except obviously contents of disk 7 is gone, however I thought I should be able to mount that drive and copy contents of reiserfs files onto the new 2TB drive. Problem with mounting... # mkdir /mnt/ext # mount -t reiserfs /dev/sdn /mnt/ext mount: wrong fs type, bad option, bad superblock on /dev/sdn, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so # reiserfsck --check --rebuild-sb /dev/sdn ... Do you want to rebuild the journal header? (y/n)[n]: y Reiserfs super block in block 16 on 0x8d0 of format 3.6 with standard journal Count of blocks on the device: 195695728 Number of bitmaps: 5973 Blocksize: 4096 Free blocks (count of blocks - used [journal, bitmaps, data, reserved] blocks): 0 Root block: 0 Filesystem is NOT clean Tree height: 0 Hash function used to sort names: not set Objectid map size 0, max 972 Journal parameters: Device [0x0] Magic [0x0] Size 8193 blocks (including 1 for journal header) (first block 18) Max transaction length 1024 blocks Max batch size 900 blocks Max commit age 30 Blocks reserved by journal: 0 Fs state field: 0x1: some corruptions exist. sb_version: 2 inode generation number: 0 UUID: 2c9898c6-7c9d-4239-ad5a-f920802af9b5 LABEL: Set flags in SB: Mount count: 1 Maximum mount count: 30 Last fsck run: Tue Sep 3 11:45:58 2013 Check interval in days: 180 Is this ok ? (y/n)[n]: y The fs may still be unconsistent. Run reiserfsck --check. # reiserfsck --check /dev/sdn reiserfsck --check started at Tue Sep 3 11:46:55 2013 ########### Replaying journal: Done. Reiserfs journal '/dev/sdn' in blocks [18..8211]: 0 transactions replayed Zero bit found in on-disk bitmap after the last valid bit. Checking internal tree.. Bad root block 0. (--rebuild-tree did not complete) Aborted *** How do I get access to the files on the old disk 7 drive? Link to comment
JonathanM Posted September 3, 2013 Share Posted September 3, 2013 # mount -t reiserfs /dev/sdn /mnt/ext Try /dev/sdn1. I'm pretty sure you should be operating on the first partition, not the raw drive. Link to comment
dwoods99 Posted September 3, 2013 Author Share Posted September 3, 2013 I had tried # mount -t reiserfs /dev/sdn1 /mnt/ext mount: special device /dev/sdn1 does not exist hence I tried # reiserfsck --check /dev/sdn reiserfs_open: the reiserfs superblock cannot be found on /dev/sdn. Failed to open the filesystem. Link to comment
dgaschk Posted September 3, 2013 Share Posted September 3, 2013 The correct command is "reiserfsck --check /dev/sdn1" Link to comment
dwoods99 Posted September 3, 2013 Author Share Posted September 3, 2013 The correct command is "reiserfsck --check /dev/sdn1" As stated in previous post(s), sdn1 did not exist after connecting hard disk via an external enclosure -- no more internal slots. # dmesg |tail sd 8:0:0:0: [sdn] No Caching mode page present sd 8:0:0:0: [sdn] Assuming drive cache: write through sdn: unknown partition table sd 8:0:0:0: [sdn] No Caching mode page present sd 8:0:0:0: [sdn] Assuming drive cache: write through sd 8:0:0:0: [sdn] Attached SCSI disk REISERFS warning (device sdn): sh-2021 reiserfs_fill_super: can not find reiserfs on sdn FAT-fs (sdn): bogus number of reserved sectors FAT-fs (sdn): Can't find a valid FAT filesystem REISERFS warning (device sdn): sh-2021 reiserfs_fill_super: can not find reiserfs on sdn Link to comment
dwoods99 Posted September 4, 2013 Author Share Posted September 4, 2013 update: I decided to remove the old disk 7 drive from the enclosure, and place it directly to an on-board sata slot in the second server. Found that it showed up correctly as sdh/sdh1, able to mount it as reiserfs type into /mnt/user/Movies Added Movies as a new share via web interface -- even though it's not part of the array on the 2nd server. Shows up fine on my PC, so now I am copying the contents into server 1 and it's being allocated into the 2TB drive. It will take a while but it's a solution that's working now. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.