hdparm -C is hung -- causing web interface to be inaccessible

dwoods99 · August 21, 2013

I replaced a 2TB WD green drive with a 3TB version and while doing a rebuild of the data,

it seems to hang after 70%. The web interface does not want to respond and all I can see

with ps -ef are processes like hdparm -C /dev/sdl that appear to be hung or defunct

root      5112  2190  0 06:40 ?        00:00:00 /usr/sbin/hdparm -C /dev/sdl
root      5113  2190  0 06:40 ?        00:00:00 [hdparm] <defunct>
root      5114  2190  0 06:40 ?        00:00:00 [hdparm] <defunct>
root      5115  2190  0 06:40 ?        00:00:00 [hdparm] <defunct>
root      5123  2193  0 06:41 ?        00:00:00 /bin/bash ./s3.sh
root      5124  5123  0 06:41 ?        00:00:00 /bin/bash ./s3.sh
root      5125  5124  0 06:41 ?        00:00:00 /bin/bash ./s3.sh
root      5126  5124  0 06:41 ?        00:00:00 wc -l
root      5150  5125  0 06:41 ?        00:00:00 hdparm -C /dev/sdl

I can't stop the array and I can't powerdown even from a telnet shell.

I've tried hard reboots but the problem persists. Any ideas?

I could not spot anything in the syslog

Aug 18 15:45:25 Moat emhttp: ST3000DM001-1CH166_####HVN (sda) 2930266584
Aug 18 15:45:25 Moat emhttp: WDC_WD30EZRX-00DC0B0_WD-####6740 (sdb) 2930266584
Aug 18 15:45:25 Moat emhttp: WDC_WD30EZRX-00DC0B0_WD-####3952 (sdc) 2930266584
Aug 18 15:45:25 Moat emhttp: WDC_WD20EARS-00MVWB0_WD-####7613 (sdd) 1953514584
Aug 18 15:45:25 Moat emhttp: WDC_WD20EARS-00MVWB0_WD-####3254 (sdf) 1953514584
Aug 18 15:45:25 Moat emhttp: WDC_WD20EFRX-68AX9N0_WD-####0521 (sdg) 1953514584
Aug 18 15:45:25 Moat emhttp: WDC_WD20EARS-00MVWB0_WD-####1569 (sdh) 1953514584
Aug 18 15:45:25 Moat emhttp: WDC_WD20EARS-00MVWB0_WD-####9546 (sdi) 1953514584
Aug 18 15:45:25 Moat emhttp: WDC_WD20EARS-00S8B1_WD-####7510 (sdj) 1953514584
Aug 18 15:45:25 Moat emhttp: WDC_WD20EARX-00PASB0_WD-####7189 (sdk) 1953514584
Aug 18 15:45:25 Moat emhttp: WDC_WD20EARS-00MVWB0_WD-####5888 (sdl) 1953514584
Aug 18 15:45:25 Moat emhttp: WDC_WD30EZRX-00DC0B0_WD-####3235 (sdm) 2930266584
Aug 18 15:45:25 Moat kernel: mdcmd (1): import 0 8,0 2930266532 ST3000DM001-1CH166_####QHVN
Aug 18 15:45:25 Moat kernel: md: import disk0: [8,0] (sda) ST3000DM001-1CH166_####QHVN size: 2930266532
Aug 18 15:45:25 Moat kernel: mdcmd (2): import 1 8,96 1953514552 WDC_WD20EFRX-68AX9N0_WD-####0521
Aug 18 15:45:25 Moat kernel: md: import disk1: [8,96] (sdg) WDC_WD20EFRX-68AX9N0_WD-####0521 size: 1953514552
Aug 18 15:45:25 Moat kernel: mdcmd (3): import 2 8,112 1953514552 WDC_WD20EARS-00MVWB0_WD-####1569
Aug 18 15:45:25 Moat kernel: md: import disk2: [8,112] (sdh) WDC_WD20EARS-00MVWB0_WD-####1569 size: 1953514552
Aug 18 15:45:25 Moat kernel: mdcmd (4): import 3 8,80 1953514552 WDC_WD20EARS-00MVWB0_WD-####3254
Aug 18 15:45:25 Moat kernel: md: import disk3: [8,80] (sdf) WDC_WD20EARS-00MVWB0_WD-####3254 size: 1953514552
Aug 18 15:45:25 Moat kernel: mdcmd (5): import 4 8,128 1953514552 WDC_WD20EARS-00MVWB0_WD-####9546
Aug 18 15:45:25 Moat kernel: md: import disk4: [8,128] (sdi) WDC_WD20EARS-00MVWB0_WD-####9546 size: 1953514552
Aug 18 15:45:25 Moat kernel: mdcmd (6): import 5 8,144 1953514552 WDC_WD20EARS-00S8B1_WD-####7510
Aug 18 15:45:25 Moat kernel: md: import disk5: [8,144] (sdj) WDC_WD20EARS-00S8B1_WD-####7510 size: 1953514552
Aug 18 15:45:25 Moat kernel: mdcmd (7): import 6 8,48 1953514552 WDC_WD20EARS-00MVWB0_WD-####7613
Aug 18 15:45:25 Moat kernel: md: import disk6: [8,48] (sdd) WDC_WD20EARS-00MVWB0_WD-####7613 size: 1953514552
Aug 18 15:45:25 Moat kernel: mdcmd (: import 7 8,32 2930266532 WDC_WD30EZRX-00DC0B0_WD-####3952
Aug 18 15:45:25 Moat kernel: md: import disk7: [8,32] (sdc) WDC_WD30EZRX-00DC0B0_WD-####3952 size: 2930266532
Aug 18 15:45:25 Moat kernel: mdcmd (9): import 8 8,192 2930266532 WDC_WD30EZRX-00DC0B0_WD-####3235
Aug 18 15:45:25 Moat kernel: md: import disk8: [8,192] (sdm) WDC_WD30EZRX-00DC0B0_WD-####3235 size: 2930266532
Aug 18 15:45:25 Moat kernel: mdcmd (10): import 9 8,176 1953514552 WDC_WD20EARS-00MVWB0_WD-####5888
Aug 18 15:45:25 Moat kernel: md: import disk9: [8,176] (sdl) WDC_WD20EARS-00MVWB0_WD-####5888 size: 1953514552
Aug 18 15:45:25 Moat kernel: mdcmd (11): import 10 8,16 2930266532 WDC_WD30EZRX-00DC0B0_WD-####6740
Aug 18 15:45:25 Moat kernel: md: import disk10: [8,16] (sdb) WDC_WD30EZRX-00DC0B0_WD-####6740 size: 2930266532
Aug 18 15:45:25 Moat kernel: mdcmd (12): import 11 8,160 1953514552 WDC_WD20EARX-00PASB0_WD-####7189
Aug 18 15:45:25 Moat kernel: md: import disk11: [8,160] (sdk) WDC_WD20EARX-00PASB0_WD-####7189 size: 1953514552

Aug 18 15:45:03 Moat kernel: sd 1:0:4:0: [sdj] Attached SCSI disk
Aug 18 15:45:03 Moat kernel: sd 1:0:5:0: [sdk] Attached SCSI disk
Aug 18 15:45:03 Moat logger: /etc/rc.d/rc.inet1:  /sbin/ifconfig lo 127.0.0.1
Aug 18 15:45:03 Moat logger: /etc/rc.d/rc.inet1:  /sbin/route add -net 127.0.0.0 netmask 255.0.0.0 lo
Aug 18 15:45:03 Moat logger: /etc/rc.d/rc.inet1:  /sbin/ifconfig eth0 192.168.1.55 broadcast 192.168.1.255 netmask 255.255.255.0
Aug 18 15:45:03 Moat kernel: r8168: eth0: link down
Aug 18 15:45:03 Moat logger: /etc/rc.d/rc.inet1:  /sbin/route add default gw 192.168.1.1 metric 1
Aug 18 15:45:03 Moat rpc.statd[1222]: Version 1.2.2 starting
Aug 18 15:45:03 Moat sm-notify[1223]: Version 1.2.2 starting
Aug 18 15:45:03 Moat rpc.statd[1222]: Failed to read /var/lib/nfs/state: Success
Aug 18 15:45:03 Moat rpc.statd[1222]: Initializing NSM state
Aug 18 15:45:03 Moat rpc.statd[1222]: Running as root.  chown /var/lib/nfs to choose different user
Aug 18 15:45:03 Moat ntpd[1238]: ntpd [email protected] Sat Apr 24 19:01:14 UTC 2010 (1)
Aug 18 15:45:03 Moat ntpd[1239]: proto: precision = 0.260 usec
Aug 18 15:45:03 Moat ntpd[1239]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16
Aug 18 15:45:03 Moat ntpd[1239]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
Aug 18 15:45:03 Moat ntpd[1239]: Listen normally on 1 lo 127.0.0.1 UDP 123
Aug 18 15:45:03 Moat ntpd[1239]: Listen normally on 2 eth0 192.168.1.55 UDP 123
Aug 18 15:45:03 Moat acpid: starting up with proc fs
Aug 18 15:45:03 Moat acpid: skipping conf file /etc/acpi/events/.
Aug 18 15:45:03 Moat acpid: skipping conf file /etc/acpi/events/..
Aug 18 15:45:03 Moat acpid: 1 rule loaded
Aug 18 15:45:03 Moat acpid: waiting for events: event logging is off
Aug 18 15:45:03 Moat crond[1261]: /usr/sbin/crond 4.4 dillon's cron daemon, started with loglevel notice
Aug 18 15:45:06 Moat kernel: r8168: eth0: link up
Aug 18 15:45:06 Moat kernel: r8168: eth0: link up
Aug 18 15:45:25 Moat logger: installing plugin: *
Aug 18 15:45:25 Moat logger:
Aug 18 15:45:25 Moat logger: Warning: simplexml_load_file(): I/O warning : failed to load external entity "/boot/config/plugins/
*.plg" in /usr/local/sbin/installplg on line 13
Aug 19 01:57:47 Moat kernel: sd 1:0:7:0: task abort: SUCCESS scmd(f0cac180)
... repeating ...
Aug 19 13:43:59 Moat kernel: sd 1:0:7:0: attempting task abort! scmd(f0dea180)
Aug 19 13:43:59 Moat kernel: sd 1:0:7:0: [sdm] CDB: cdb[0]=0x28: 28 00 f8 9f 44 d0 00 04 00 00
Aug 19 13:43:59 Moat kernel: scsi target1:0:7: handle(0x0010), sas_address(0x4433221105000000), phy(5)
Aug 19 13:43:59 Moat kernel: scsi target1:0:7: enclosure_logical_id(0x500304800ee2af00), slot(5)
Aug 19 13:43:59 Moat kernel: sd 1:0:7:0: task abort: SUCCESS scmd(f0dea180)
Aug 19 13:44:30 Moat kernel: sd 1:0:7:0: attempting task abort! scmd(f0dea180)
Aug 19 13:44:30 Moat kernel: sd 1:0:7:0: [sdm] CDB: cdb[0]=0x28: 28 00 f8 9f 44 d0 00 04 00 00
Aug 19 13:44:30 Moat kernel: scsi target1:0:7: handle(0x0010), sas_address(0x4433221105000000), phy(5)
Aug 19 13:44:30 Moat kernel: scsi target1:0:7: enclosure_logical_id(0x500304800ee2af00), slot(5)
Aug 19 13:44:30 Moat kernel: sd 1:0:7:0: task abort: SUCCESS scmd(f0dea180)

RobJ · August 21, 2013

70% is roughly at about 2.1 TB, are you sure your disk controller supports drives over 2TB? Check for a BIOS or firmware update, for board or card.

BobPhoenix · August 21, 2013

What he said - most likely.

But I had a freeze when unRAID was formatting a precleared drive the other night. When I canceled the preclear that was currently running (PreRead step) the format immediately started up again and was finished within a few seconds to a minute.

So long story short were you doing anything else while rebuilding the drive?

dwoods99 · August 22, 2013

The parity drive is 3TB Seagate and there were already 2 x 3TB WD Green drives in the mix.

No, I was replacing a 2TB with a 3TB to provide more space in the array. I did not pre-clear it.

Edit: I checked and 3TB drives are on motherboard as well as new drive but new one is not

using blue sata cable into blue ports. Changed and trying rebuild again.

dwoods99 · August 22, 2013

As expected it made no difference. Stuck again with hung hdparm processes.

I will try to move drive connection to the sata expansion boards.

RobJ · August 23, 2013

I would check a SMART report for the drive, make sure it's OK. Then get an hdparm report ("hdparm -I /dev/sdx"), and make sure it looks right for a 3TB drive. Feel free to post them for us to check. Then try Preclearing it, to make sure the entire drive can be accessed successfully.

By the way, which drive is sdl? Can you zip the syslog and attach it?

dwoods99 · August 24, 2013

It seems that disk7 (3TB) checks out ok with smartctl but disk8 (3TB) failed -- these are both new drives this month.

Problem is that the data rebuild is doing many writes on disk7 to rebuild and fix.

I have a third 3TB drive, pre-cleared and ready to use.

Does it make sense to swap out disk7 with the new one, let it rebuild and hopefully this time no more hanging.

And then once completed, pre-clear the old disk7, and swap it with disk8 and rebuild again.

Then I'll be able to send disk8 back to the store.

Obviously I'm trying to avoid losing any data in disk7 or disk8.

Is this a good approach?

dgaschk · August 24, 2013

If disk 8 has Failed SMART then it is the one that needs to be replaced.

dwoods99 · August 24, 2013

I understand that however since I see disk7 being the one trying to rebuild data,

I am worried that removing disk8 and rebuilding it would cause data loss from disk7.

Is that wrong?

dwoods99 · August 25, 2013

When I tried to replace disk8, I now get Too many wrong and/or missing disks! after selecting the new one.

To me this means that my fears of losing data on disk7 *and* disk8 are valid.

I don't think I can do anything else but replace disk7 first in order to rebuild the data, and then disk8.

Open to other suggestions.

dgaschk · August 25, 2013

Replace the original disk 7 and set a new config. Assign all disks and indicate that parity is correct. Start the array then stop the array and replace disk8.

RobJ · August 25, 2013

I've hesitated to speak up, because there hasn't been enough info provided to be sure of the situation. I've been hoping to see a syslog and SMART reports for the involved drives, to know for sure what is good and what has issues. The advice from dgaschk sounds like the right plan, but with 2 drives involved, there's obviously a much higher chance of data loss, if the situation proves to be a little different than we can gather from limited info. You said "disk8 (3TB) failed", and that may be true, but respectfully we can't know for sure how knowledgeable you are about determining that without knowing you ourselves, and also a 'failed disk' can mean many things, from 'won't even spin up' to 'has an alarmingly high SMART attribute but still useable'. Knowing that could make a big difference. Also it would be good to make sure we know the true state of both the new Disk 7 and the old 2TB Disk 7. If you do have a bad Disk 8 and a perfectly good original Disk 7 (even if only 2 TB), then dgaschk's plan sounds correct. You want to restore the original working array, postponing the expansion of Disk 7, until the rest of the array is fine.

What also confuses me a little is that the primary complaint in your original post was about the hung 'hdparm -C' processes, but they actually appear to apply to sdl, which is Disk 9! Is Disk 9 OK? I have no idea what the significance of those hung processes are, but it might be good to know for sure about the state of Disk 9 too.

Under no circumstances do you want to Preclear the original Disk 7, until the new array is completely fine.

dwoods99 · August 26, 2013

Replace the original disk 7 and set a new config. Assign all disks and indicate that parity is correct. Start the array then stop the array and replace disk8.

I have done this but I still got Too many wrong and/or missing disks!.

I put disk 7 and 8 back the way it was, did 'new config' and now fixing parity.

I am sure the contents of disk 7 was correct so I am taking the risk that I won't lose data -- no other choice anymore.

@RobJ,

It just seems that disk 8 was being spun up or down and hanging during rebuild of data on disk 7, causing the rebuild to fail.

As for disk 8 'failed', I was referring to smartctl output when short test run from the simplefeatures web menu.

I previously posted what was most likely the only relevant parts to the syslog that might indicate disk7/8 problems.

I am no expert on smartctl but pretty knowledgeable with servers and unix.

The original 2TB disk 7 has already been allocated and used in my second unRaid server.

Thanks to all who have been helping me with this problem.

dwoods99 · August 26, 2013

No luck!

Once again it gets to 71% 2.13TB and freezes -- web interface no longer wants to respond.

Only thing left I can think of to do is get a new 2TB drive and replace disk 7 with that to force the rebuild or parity sync to end properly --- and then replace disk 8.

Any other ideas?

dgaschk · August 27, 2013

Replace the original disk 7 and set a new config. Assign all disks and indicate that parity is correct. Start the array then stop the array and replace disk8.

I have done this but I still got Too many wrong and/or missing disks!.

I put disk 7 and 8 back the way it was, did 'new config' and now fixing parity.

I am sure the contents of disk 7 was correct so I am taking the risk that I won't lose data -- no other choice anymore.

@RobJ,

It just seems that disk 8 was being spun up or down and hanging during rebuild of data on disk 7, causing the rebuild to fail.

As for disk 8 'failed', I was referring to smartctl output when short test run from the simplefeatures web menu.

I previously posted what was most likely the only relevant parts to the syslog that might indicate disk7/8 problems.

I am no expert on smartctl but pretty knowledgeable with servers and unix.

The original 2TB disk 7 has already been allocated and used in my second unRaid server.

Thanks to all who have been helping me with this problem.

After selecting New Config it's not possible to receive the message "Too many wrong and/or missing disks". This operation clears the disk assignments and you must reassign all of the disks correctly. Assign the original 2T disk 7 and the other disks including parity to their respective slots. Check the "Parity is correct" box and start the array. Now disk 8 can be replaced...

dwoods99 · September 3, 2013

Clicking "Parity is correct" still wouldn't work -- emhttp gets hungs at 71% 2.13TB size.

I removed the 3TB disk7 drive and replaced with a 2TB blank drive -- wrong size for replacement disk.

Next I forced a new config with the 2TB drive, rebuilt parity -- ok.

Followed by replacing disk 8 with the new 3TB drive, force rebuilding of data and parity -- ok.

System ok now except obviously contents of disk 7 is gone, however I thought I should be able to mount that

drive and copy contents of reiserfs files onto the new 2TB drive.

Problem with mounting...

# mkdir /mnt/ext

# mount -t reiserfs /dev/sdn /mnt/ext

mount: wrong fs type, bad option, bad superblock on /dev/sdn,

missing codepage or helper program, or other error

In some cases useful info is found in syslog - try dmesg | tail or so

# reiserfsck --check --rebuild-sb /dev/sdn

...

Do you want to rebuild the journal header? (y/n)[n]: y

Reiserfs super block in block 16 on 0x8d0 of format 3.6 with standard journal

Count of blocks on the device: 195695728

Number of bitmaps: 5973

Blocksize: 4096

Free blocks (count of blocks - used [journal, bitmaps, data, reserved] blocks): 0

Root block: 0

Filesystem is NOT clean

Tree height: 0

Hash function used to sort names: not set

Objectid map size 0, max 972

Journal parameters:

Device [0x0]

Magic [0x0]

Size 8193 blocks (including 1 for journal header) (first block 18)

Max transaction length 1024 blocks

Max batch size 900 blocks

Max commit age 30

Blocks reserved by journal: 0

Fs state field: 0x1:

some corruptions exist.

sb_version: 2

inode generation number: 0

UUID: 2c9898c6-7c9d-4239-ad5a-f920802af9b5

LABEL:

Set flags in SB:

Mount count: 1

Maximum mount count: 30

Last fsck run: Tue Sep 3 11:45:58 2013

Check interval in days: 180

Is this ok ? (y/n)[n]: y

The fs may still be unconsistent. Run reiserfsck --check.

# reiserfsck --check /dev/sdn

reiserfsck --check started at Tue Sep 3 11:46:55 2013

###########

Replaying journal: Done.

Reiserfs journal '/dev/sdn' in blocks [18..8211]: 0 transactions replayed

Zero bit found in on-disk bitmap after the last valid bit.

Checking internal tree..

Bad root block 0. (--rebuild-tree did not complete)

Aborted

***

How do I get access to the files on the old disk 7 drive?

JonathanM · September 3, 2013

# mount -t reiserfs /dev/sdn /mnt/ext

Try /dev/sdn1. I'm pretty sure you should be operating on the first partition, not the raw drive.

dwoods99 · September 3, 2013

I had tried

# mount -t reiserfs /dev/sdn1 /mnt/ext

mount: special device /dev/sdn1 does not exist

hence I tried

# reiserfsck --check /dev/sdn

reiserfs_open: the reiserfs superblock cannot be found on /dev/sdn.

Failed to open the filesystem.

dgaschk · September 3, 2013

The correct command is "reiserfsck --check /dev/sdn1"

dwoods99 · September 3, 2013

The correct command is "reiserfsck --check /dev/sdn1"

As stated in previous post(s), sdn1 did not exist after connecting hard disk via an external enclosure -- no more internal slots.

# dmesg |tail
sd 8:0:0:0: [sdn] No Caching mode page present
sd 8:0:0:0: [sdn] Assuming drive cache: write through
sdn: unknown partition table
sd 8:0:0:0: [sdn] No Caching mode page present
sd 8:0:0:0: [sdn] Assuming drive cache: write through
sd 8:0:0:0: [sdn] Attached SCSI disk
REISERFS warning (device sdn): sh-2021 reiserfs_fill_super: can not find reiserfs on sdn
FAT-fs (sdn): bogus number of reserved sectors
FAT-fs (sdn): Can't find a valid FAT filesystem
REISERFS warning (device sdn): sh-2021 reiserfs_fill_super: can not find reiserfs on sdn

dwoods99 · September 4, 2013

update:

I decided to remove the old disk 7 drive from the enclosure, and place it directly to an on-board sata slot in the second server.

Found that it showed up correctly as sdh/sdh1, able to mount it as reiserfs type into /mnt/user/Movies

Added Movies as a new share via web interface -- even though it's not part of the array on the 2nd server.

Shows up fine on my PC, so now I am copying the contents into server 1 and it's being allocated into the 2TB drive.

It will take a while but it's a solution that's working now.

hdparm -C is hung -- causing web interface to be inaccessible

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Archived