unable to stop array per the GUI, and put into maintenance mode


Recommended Posts

when I check the box, then hit "Stop", it appears to go thru the steps to stop the array, but the array never shows as stopped in the GUI.

 

It appears to have actually stopped the array, as none of the user shares is available now, but the GUI won't update, which means I can't put the array into maintenance mode.

 

I've had lots of issues with unformatted drives, red balls, cache drive, etc, and after getting an initial response from Tom via email asking for more info, and telling me not to do anything, I've not heard anything else for about 3 days, so I guess I need to just figure out a resolution on my own. 

 

That seems to need to start with me running resierfsck on the unformatted drive.  It seems I need to put the array into maintenance mode, which I'm unable to do.

 

I wish I could get more help from Tom, but he's busy getting beta10 ready.

 

So, does anyone else know how to force the array into maintenance mode, so I can try to recover my drive?

 

thanks in advance.

Link to comment

From a shell you can try a "df -k" to check for mounted filesystems to see if any are still mounted.  My guess is there is a process that has a file/directory in use such as a shell.  You might be able to manually unmount any left over or check to see what PID has the filesystem in use.

 

Worst case you may have to reboot to kill the process and then put it in maintenance mode.

Link to comment

thanks for the info.

 

So, if it's not really unformatted, that should be 'good', right?

 

How would I get unRAID to recognize it as such?  I've restarted the server a couple of times since it started showing this way, but from putty (shutdown -r now), since I can't stop the array to do it from the GUI.

 

I really don't want to make things worse than they already are.

 

I just saw someone question the fact that the upgrade now shows 10x1 as an option in the extensions screen, so maybe beta10 is ready for release now, and maybe Tom will get back to me, but my server is pretty useless to me, and I'm wasting time I could spend on finding and fixing duplicates by just waiting for Tom to get back to me.

 

I don't want to sound like I'm slighting Tom, getting the next beta ready is more important than one persons problem, but it's frustrating when you happen to be that one person with the problem :)

Link to comment

Actually I guess the /mnt/diskX are really meta devices, so that still could be just the emulation of that drive.  To see the real drive you would have to stop the array and mount the actual device to see what is really on the drive.

 

Yeah it is always tuff dealing with storage issues, there is usually several routes, and once you do something, you usually can't undo it...

 

 

Link to comment

hmm, it's actually the meta drive I want to access, since I've moved about 1/2 of the data off of that drive before it went to unformatted.  If I 'fix' the actual drive, it will probably show all the files, including the ones I've already moved elsewhere.

 

However, I'd rather deal with finding and fixing duplicates than having to figure out what I'm missing altogether and have to re-create the missing data.

 

Your idea to umount disk9 did work, and I was able to stop the array per the GUI afterwords.

 

I decided to try restarting unRAID after this to see if it might 'find' disk9 again, and show it as formatted, but that didn't work.  When it restarted, if still shows as unformatted.

 

I guess I will get it stopped again, and run reiserfsck on disk9 and get to work on fixing duplicates, assuming it finds anything on the disk at all.

Link to comment

hmmm, so if I run

 

reiserfsck --check /dev/md9

 

are you saying that it will check the virtual/emulated disk, and not the actual hard drive?  If so, that's actually a good thing for me, as I want to 'fix' the emulated disk, so I can get it to show only the files that remain after I moved about 1/2 of them off the emulated drive.

 

I guess I'll find out soon, since I did run that command, and it's processing as I type.

 

thanks again for all your help with all my issues the last week or so!!

Link to comment

hmmm, so if I run

 

reiserfsck --check /dev/md9

 

are you saying that it will check the virtual/emulated disk, and not the actual hard drive?  If so, that's actually a good thing for me, as I want to 'fix' the emulated disk, so I can get it to show only the files that remain after I moved about 1/2 of them off the emulated drive.

 

I guess I'll find out soon, since I did run that command, and it's processing as I type.

 

thanks again for all your help with all my issues the last week or so!!

Yes - if a disk is red-balled then unRAID has stopped writing to it and using the 'md' device is working against the emulated disk.  You need to do a disk rebuild to get the data written to a real physical disk.

Link to comment

So, here are the results.  I don't know what my best next action would be...

 

root@media:~# reiserfsck --check /dev/md9

reiserfsck 3.6.24

 

Will read-only check consistency of the filesystem on /dev/md9

Will put log info to 'stdout'

 

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

###########

reiserfsck --check started at Sat Sep 27 10:32:37 2014

###########

Replaying journal: Done.

Reiserfs journal '/dev/md9' in blocks [18..8211]: 0 transactions replayed

Checking internal tree.. \/  7 (of  16\/ 45 (of 103//  1 (of 170-block 190808067: The number of items (6) is incorrect, should be (0)

the problem in the internal node occured (190808067), whole subtree is skipped                                          / 12 (of  16// 27 (of 170// 98 (of 170\block 440476206: The level of the node (29666) is not correct, (1) expected

the problem in the internal node occured (440476206), whole subtree is skipped                                        finished

Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.

Bad nodes were found, Semantic pass skipped

2 found corruptions can be fixed only when running with --rebuild-tree

###########

reiserfsck finished at Sat Sep 27 11:40:34 2014

###########

 

Link to comment

You now want to run with the --rebuild-tree option to get a valid file system back.  Most of the time this gets back virtually all the files, but if there was severe corruption then some will be missing.

 

There is another option that can be used with the --rebuild-tree which is --scan-whole-partition.  This reads every sector on the disk looking for what look like files.  It can recover more data than running without it, but it can also result in spurious (e.g. partial or deleted) files being found and added to the lost_found folder.

Link to comment

thanks.  The wiki is very specific that it has to be done 'right', so to confirm I should run this...

 

reiserfsck --rebuild-tree /dev/md9

 

exactly as shown above, correct?

 

Once done, it should show the drive in the GUI as formatted again, but still red-balled, but that should allow me to copy/move files off of that disk onto another disk in my array, leaving disk9 essentially 'empty'.

 

Once that's done, I should be able to format disk9 (as XFS), then it can be added back into the array, and I should be good to go again.

 

Does that sound about right, or am I missing something?

 

thanks again

Link to comment

thanks.  The wiki is very specific that it has to be done 'right', so to confirm I should run this...

 

reiserfsck --rebuild-tree /dev/md9

 

exactly as shown above, correct?

 

Once done, it should show the drive in the GUI as formatted again, but still red-balled, but that should allow me to copy/move files off of that disk onto another disk in my array, leaving disk9 essentially 'empty'.

 

Once that's done, I should be able to format disk9 (as XFS), then it can be added back into the array, and I should be good to go again.

 

Does that sound about right, or am I missing something?

 

thanks again

Basically corrent.

 

Not sure if simply reformatting the disk will remove the red-ball status!  I suspect you may have to still go through a rebuild step to clear this state.    Note however, that as far as I know, you can do the reformat to XFS on the 'virtual' disk before the rebuild.    This would have the advantage of being quick, and if you wanted to you could start moving files back during the rebuild process )although it would slow down the rebuild).  You may prefer to wait for the rebuild to finish.

Link to comment

bad news it seems...

 

root@media:~# reiserfsck --rebuild-tree /dev/md9
reiserfsck 3.6.24

*************************************************************
** Do not  run  the  program  with  --rebuild-tree  unless **
** something is broken and MAKE A BACKUP  before using it. **
** If you have bad sectors on a drive  it is usually a bad **
** idea to continue using it. Then you probably should get **
** a working hard drive, copy the file system from the bad **
** drive  to the good one -- dd_rescue is  a good tool for **
** that -- and only then run this program.                 **
*************************************************************

Will rebuild the filesystem (/dev/md9) tree
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
Replaying journal: Done.
Reiserfs journal '/dev/md9' in blocks [18..8211]: 0 transactions replayed
###########
reiserfsck --rebuild-tree started at Sat Sep 27 13:15:12 2014
###########

Pass 0:
####### Pass 0 #######
Loading on-disk bitmap .. ok, 376654954 blocks marked used
Skipping 30567 blocks (super block, journal, bitmaps) 376624387 blocks will be read
0%..block 67015897: The number of items (1) is incorrect, should be (0) - corrected
block 67015897: The free space (17) is incorrect, should be (4072) - corrected
block 68140233: The number of items (6144) is incorrect, should be (1) - corrected
block 68140233: The free space (59648) is incorrect, should be (464) - corrected
pass0: vpf-10110: block 68140233, item (0): Unknown item type found [201326593 385872640 0x49ffff00  (15)] - deleted
block 70212653: The number of items (65521) is incorrect, should be (1) - corrected
block 70212653: The free space (33) is incorrect, should be (2037) - corrected
pass0: vpf-10110: block 70212653, item (0): Unknown item type found [4293459968 92209184 0xca908bf  (15)] - deleted
block 71292588: The number of items (1107) is incorrect, should be (1) - corrected
block 71292588: The free space (165) is incorrect, should be (2144) - corrected
pass0: vpf-10110: block 71292588, item (0): Unknown item type found [69599234 155844917 0x6540a03  (15)] - deleted
block 72139248: The number of items (1) is incorrect, should be (0) - corrected
block 72139248: The free space (7) is incorrect, should be (4072) - corrected
block 73023393: The number of items (1024) is incorrect, should be (1) - corrected
block 73023393: The free space (512) is incorrect, should be (3280) - corrected
pass0: vpf-10110: block 73023393, item (0): Unknown item type found [67109119 50332160 0x1e000c00  (15)] - deleted
block 73111698: The number of items (65414) is incorrect, should be (1) - corrected
block 73111698: The free space (256) is incorrect, should be (3793) - corrected
pass0: vpf-10110: block 73111698, item (0): Unknown item type found [67174655 277938189 0x20007  (15)] - deleted
block 73344453: The number of items (1024) is incorrect, should be (1) - corrected
block 73344453: The free space (768) is incorrect, should be (3792) - corrected
pass0: vpf-10110: block 73344453, item (0): Unknown item type found [83886082 33555968 0xad000b00  (15)] - deleted
block 73481095: The number of items (1) is incorrect, should be (0) - corrected
block 73481095: The free space (4) is incorrect, should be (4072) - corrected
block 73559406: The number of items (1) is incorrect, should be (0) - corrected
block 73559406: The free space (65442) is incorrect, should be (4072) - corrected
.block 76566909: The number of items (1280) is incorrect, should be (1) - corrected
block 76566909: The free space (256) is incorrect, should be (3792) - corrected
pass0: vpf-10110: block 76566909, item (0): Unknown item type found [50331905 16777472 0xde000e00  (15)] - deleted
block 77580715: The number of items (1) is incorrect, should be (0) - corrected
block 77580715: The free space (65480) is incorrect, should be (4072) - corrected
block 79472804: The number of items (2304) is incorrect, should be (1) - corrected
block 79472804: The free space (4352) is incorrect, should be (3536) - corrected
pass0: vpf-10110: block 79472804, item (0): Unknown item type found [117441026 134222080 0x24001300  (15)] - deleted
block 79475198: The number of items (768) is incorrect, should be (1) - corrected
block 79475198: The free space (63744) is incorrect, should be (2768) - corrected
pass0: vpf-10110: block 79475198, item (0): Unknown item type found [33554433 218103296 0x95000a00  (15)] - deleted
block 80145536: The number of items (768) is incorrect, should be (1) - corrected
block 80145536: The free space (256) is incorrect, should be (1488) - corrected
pass0: vpf-10110: block 80145536, item (0): Unknown item type found [83886849 167772672 0x70000600  (15)] - deleted
.20%                                            left 292991467, 26021 /sec
The problem has occurred looks like a hardware problem. If you have
bad blocks, we advise you to get a new hard drive, because once you
get one bad block  that the disk  drive internals  cannot hide from
your sight,the chances of getting more are generally said to become
much higher  (precise statistics are unknown to us), and  this disk
drive is probably not expensive enough  for you to you to risk your
time and  data on it.  If you don't want to follow that follow that
advice then  if you have just a few bad blocks,  try writing to the
bad blocks  and see if the drive remaps  the bad blocks (that means
it takes a block  it has  in reserve  and allocates  it for use for
of that block number).  If it cannot remap the block,  use badblock
option (-B) with  reiserfs utils to handle this block correctly.

bread: Cannot read the block (122410614): (Input/output error).

Aborted

 

suggestions?

Link to comment

That is the commonest response from reiserfsck if a drive come up a unformatted.  You need to run as suggested with the --rebuild-tree to fix the issues.  In most cases this clears the problem wit virtually no data loss.

 

Note that at this point since the drive is red-balled you are writing to the emulated 'md9' device - not the [physical drive.  You still have the physical drive (I would remove it from the server to play safe) to fall back on.

Link to comment

screenshot attached.

 

I ran smartctl  -a  -d  ata  /dev/sdg  >/boot/smart.txt, but did not manually force a smart test recently.  not sure if this report is valid, or useful in this case.

 

smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.16.2-unRAID] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model:     WDC WD30EZRX-00D8PB0
Serial Number:    WD-WCC4N1252623
LU WWN Device Id: 5 0014ee 25f94d9ce
Firmware Version: 80.00A80
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Sep 27 17:20:55 2014 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
				was completed without error.
				Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
				without error or no self-test has ever 
				been run.
Total time to complete Offline 
data collection: 		(40560) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
				Auto Offline data collection on/off support.
				Suspend Offline collection upon new
				command.
				Offline surface scan supported.
				Self-test supported.
				Conveyance Self-test supported.
				Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
				power-saving mode.
				Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
				General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 407) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x7035)	SCT Status supported.
				SCT Feature Control supported.
				SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   178   177   021    Pre-fail  Always       -       6058
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       170
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       697
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       12
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       5
193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -       3836
194 Temperature_Celsius     0x0022   122   113   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

gui.jpg.34b492c06d169d8699671e74bef885e3.jpg

Link to comment

So, I assume the drive is screwed beyond remedy, or at least the parity representation of it; correct?

 

if I just pull the actual drive, stick it in another machine, run a parity check on it, then stick it back into the server, i still couldn't rebuild the parity representation onto this drive correct?

 

Is there anything I can do at this point to retrieve the data off of disk9, or do I just need to move on?

 

I know I probably sound desperate, but I'm sick of not having my server, and I'm disappointed that Tom has not responded to my requests for help, so I really just want to put this behind me and get whatever remains of my data back into use, as some is better than nothing.

 

Any ideas?

Link to comment

I would stop the array completely, and mount the /dev/sdX1 partition and you should be able to see the data on the drive.  If the data on the drive looks good and mounts fine.  I think I would just start a new config and put all the drives back in as they were and atleast you should have all the original disk9 data in the array.  Then you would just need to worry about the dups..

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.