JustinChase Posted September 27, 2014 Share Posted September 27, 2014 when I check the box, then hit "Stop", it appears to go thru the steps to stop the array, but the array never shows as stopped in the GUI. It appears to have actually stopped the array, as none of the user shares is available now, but the GUI won't update, which means I can't put the array into maintenance mode. I've had lots of issues with unformatted drives, red balls, cache drive, etc, and after getting an initial response from Tom via email asking for more info, and telling me not to do anything, I've not heard anything else for about 3 days, so I guess I need to just figure out a resolution on my own. That seems to need to start with me running resierfsck on the unformatted drive. It seems I need to put the array into maintenance mode, which I'm unable to do. I wish I could get more help from Tom, but he's busy getting beta10 ready. So, does anyone else know how to force the array into maintenance mode, so I can try to recover my drive? thanks in advance. Link to comment
jphipps Posted September 27, 2014 Share Posted September 27, 2014 From a shell you can try a "df -k" to check for mounted filesystems to see if any are still mounted. My guess is there is a process that has a file/directory in use such as a shell. You might be able to manually unmount any left over or check to see what PID has the filesystem in use. Worst case you may have to reboot to kill the process and then put it in maintenance mode. Link to comment
JustinChase Posted September 27, 2014 Author Share Posted September 27, 2014 not sure what this means, but disk9 is the unformatted disk in my array... root@media:~# df -k Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 3781376 107376 3674000 3% /boot /dev/md9 2930177100 1505697032 1424480068 52% /mnt/disk9 Link to comment
jphipps Posted September 27, 2014 Share Posted September 27, 2014 Must not really be unformated since it is still mounted. You can try a "umount /mnt/disk9" and see if that will unmount and have the array continue to go offline. If it is in use, you can use the fuser command to see what pid is active, but that doesn't always show anything.. Link to comment
JustinChase Posted September 27, 2014 Author Share Posted September 27, 2014 thanks for the info. So, if it's not really unformatted, that should be 'good', right? How would I get unRAID to recognize it as such? I've restarted the server a couple of times since it started showing this way, but from putty (shutdown -r now), since I can't stop the array to do it from the GUI. I really don't want to make things worse than they already are. I just saw someone question the fact that the upgrade now shows 10x1 as an option in the extensions screen, so maybe beta10 is ready for release now, and maybe Tom will get back to me, but my server is pretty useless to me, and I'm wasting time I could spend on finding and fixing duplicates by just waiting for Tom to get back to me. I don't want to sound like I'm slighting Tom, getting the next beta ready is more important than one persons problem, but it's frustrating when you happen to be that one person with the problem Link to comment
jphipps Posted September 27, 2014 Share Posted September 27, 2014 Actually I guess the /mnt/diskX are really meta devices, so that still could be just the emulation of that drive. To see the real drive you would have to stop the array and mount the actual device to see what is really on the drive. Yeah it is always tuff dealing with storage issues, there is usually several routes, and once you do something, you usually can't undo it... Link to comment
JustinChase Posted September 27, 2014 Author Share Posted September 27, 2014 hmm, it's actually the meta drive I want to access, since I've moved about 1/2 of the data off of that drive before it went to unformatted. If I 'fix' the actual drive, it will probably show all the files, including the ones I've already moved elsewhere. However, I'd rather deal with finding and fixing duplicates than having to figure out what I'm missing altogether and have to re-create the missing data. Your idea to umount disk9 did work, and I was able to stop the array per the GUI afterwords. I decided to try restarting unRAID after this to see if it might 'find' disk9 again, and show it as formatted, but that didn't work. When it restarted, if still shows as unformatted. I guess I will get it stopped again, and run reiserfsck on disk9 and get to work on fixing duplicates, assuming it finds anything on the disk at all. Link to comment
jphipps Posted September 27, 2014 Share Posted September 27, 2014 One thing you have to watch, is if the disk is red-balled, then that is an emulated disk, so any repairs wont be written to the real physical disk unless you do a disk replacement and have it rebuild the real disk with the virtual copy. Link to comment
JustinChase Posted September 27, 2014 Author Share Posted September 27, 2014 hmmm, so if I run reiserfsck --check /dev/md9 are you saying that it will check the virtual/emulated disk, and not the actual hard drive? If so, that's actually a good thing for me, as I want to 'fix' the emulated disk, so I can get it to show only the files that remain after I moved about 1/2 of them off the emulated drive. I guess I'll find out soon, since I did run that command, and it's processing as I type. thanks again for all your help with all my issues the last week or so!! Link to comment
itimpi Posted September 27, 2014 Share Posted September 27, 2014 hmmm, so if I run reiserfsck --check /dev/md9 are you saying that it will check the virtual/emulated disk, and not the actual hard drive? If so, that's actually a good thing for me, as I want to 'fix' the emulated disk, so I can get it to show only the files that remain after I moved about 1/2 of them off the emulated drive. I guess I'll find out soon, since I did run that command, and it's processing as I type. thanks again for all your help with all my issues the last week or so!! Yes - if a disk is red-balled then unRAID has stopped writing to it and using the 'md' device is working against the emulated disk. You need to do a disk rebuild to get the data written to a real physical disk. Link to comment
jphipps Posted September 27, 2014 Share Posted September 27, 2014 Yeah, if it is red balled, that would be the case you are repairing the virtual copy. The only danger you would have is if you have another disk failure, you could loose that emulated copy. No problem.. I know how nerve racking it is when you are in fear of loosing data.. Link to comment
JustinChase Posted September 27, 2014 Author Share Posted September 27, 2014 So, here are the results. I don't know what my best next action would be... root@media:~# reiserfsck --check /dev/md9 reiserfsck 3.6.24 Will read-only check consistency of the filesystem on /dev/md9 Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes ########### reiserfsck --check started at Sat Sep 27 10:32:37 2014 ########### Replaying journal: Done. Reiserfs journal '/dev/md9' in blocks [18..8211]: 0 transactions replayed Checking internal tree.. \/ 7 (of 16\/ 45 (of 103// 1 (of 170-block 190808067: The number of items (6) is incorrect, should be (0) the problem in the internal node occured (190808067), whole subtree is skipped / 12 (of 16// 27 (of 170// 98 (of 170\block 440476206: The level of the node (29666) is not correct, (1) expected the problem in the internal node occured (440476206), whole subtree is skipped finished Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs. Bad nodes were found, Semantic pass skipped 2 found corruptions can be fixed only when running with --rebuild-tree ########### reiserfsck finished at Sat Sep 27 11:40:34 2014 ########### Link to comment
itimpi Posted September 27, 2014 Share Posted September 27, 2014 You now want to run with the --rebuild-tree option to get a valid file system back. Most of the time this gets back virtually all the files, but if there was severe corruption then some will be missing. There is another option that can be used with the --rebuild-tree which is --scan-whole-partition. This reads every sector on the disk looking for what look like files. It can recover more data than running without it, but it can also result in spurious (e.g. partial or deleted) files being found and added to the lost_found folder. Link to comment
JustinChase Posted September 27, 2014 Author Share Posted September 27, 2014 thanks. The wiki is very specific that it has to be done 'right', so to confirm I should run this... reiserfsck --rebuild-tree /dev/md9 exactly as shown above, correct? Once done, it should show the drive in the GUI as formatted again, but still red-balled, but that should allow me to copy/move files off of that disk onto another disk in my array, leaving disk9 essentially 'empty'. Once that's done, I should be able to format disk9 (as XFS), then it can be added back into the array, and I should be good to go again. Does that sound about right, or am I missing something? thanks again Link to comment
jphipps Posted September 27, 2014 Share Posted September 27, 2014 If all goes well and it comes back up correctly as a valid filesystem and is still red balled, you still have to rebuild that on a physical drive by either replacing the drive and letting it rebuild, or start with a new config and let it either build/check parity. Link to comment
itimpi Posted September 27, 2014 Share Posted September 27, 2014 thanks. The wiki is very specific that it has to be done 'right', so to confirm I should run this... reiserfsck --rebuild-tree /dev/md9 exactly as shown above, correct? Once done, it should show the drive in the GUI as formatted again, but still red-balled, but that should allow me to copy/move files off of that disk onto another disk in my array, leaving disk9 essentially 'empty'. Once that's done, I should be able to format disk9 (as XFS), then it can be added back into the array, and I should be good to go again. Does that sound about right, or am I missing something? thanks again Basically corrent. Not sure if simply reformatting the disk will remove the red-ball status! I suspect you may have to still go through a rebuild step to clear this state. Note however, that as far as I know, you can do the reformat to XFS on the 'virtual' disk before the rebuild. This would have the advantage of being quick, and if you wanted to you could start moving files back during the rebuild process )although it would slow down the rebuild). You may prefer to wait for the rebuild to finish. Link to comment
JustinChase Posted September 27, 2014 Author Share Posted September 27, 2014 bad news it seems... root@media:~# reiserfsck --rebuild-tree /dev/md9 reiserfsck 3.6.24 ************************************************************* ** Do not run the program with --rebuild-tree unless ** ** something is broken and MAKE A BACKUP before using it. ** ** If you have bad sectors on a drive it is usually a bad ** ** idea to continue using it. Then you probably should get ** ** a working hard drive, copy the file system from the bad ** ** drive to the good one -- dd_rescue is a good tool for ** ** that -- and only then run this program. ** ************************************************************* Will rebuild the filesystem (/dev/md9) tree Will put log info to 'stdout' Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes Replaying journal: Done. Reiserfs journal '/dev/md9' in blocks [18..8211]: 0 transactions replayed ########### reiserfsck --rebuild-tree started at Sat Sep 27 13:15:12 2014 ########### Pass 0: ####### Pass 0 ####### Loading on-disk bitmap .. ok, 376654954 blocks marked used Skipping 30567 blocks (super block, journal, bitmaps) 376624387 blocks will be read 0%..block 67015897: The number of items (1) is incorrect, should be (0) - corrected block 67015897: The free space (17) is incorrect, should be (4072) - corrected block 68140233: The number of items (6144) is incorrect, should be (1) - corrected block 68140233: The free space (59648) is incorrect, should be (464) - corrected pass0: vpf-10110: block 68140233, item (0): Unknown item type found [201326593 385872640 0x49ffff00 (15)] - deleted block 70212653: The number of items (65521) is incorrect, should be (1) - corrected block 70212653: The free space (33) is incorrect, should be (2037) - corrected pass0: vpf-10110: block 70212653, item (0): Unknown item type found [4293459968 92209184 0xca908bf (15)] - deleted block 71292588: The number of items (1107) is incorrect, should be (1) - corrected block 71292588: The free space (165) is incorrect, should be (2144) - corrected pass0: vpf-10110: block 71292588, item (0): Unknown item type found [69599234 155844917 0x6540a03 (15)] - deleted block 72139248: The number of items (1) is incorrect, should be (0) - corrected block 72139248: The free space (7) is incorrect, should be (4072) - corrected block 73023393: The number of items (1024) is incorrect, should be (1) - corrected block 73023393: The free space (512) is incorrect, should be (3280) - corrected pass0: vpf-10110: block 73023393, item (0): Unknown item type found [67109119 50332160 0x1e000c00 (15)] - deleted block 73111698: The number of items (65414) is incorrect, should be (1) - corrected block 73111698: The free space (256) is incorrect, should be (3793) - corrected pass0: vpf-10110: block 73111698, item (0): Unknown item type found [67174655 277938189 0x20007 (15)] - deleted block 73344453: The number of items (1024) is incorrect, should be (1) - corrected block 73344453: The free space (768) is incorrect, should be (3792) - corrected pass0: vpf-10110: block 73344453, item (0): Unknown item type found [83886082 33555968 0xad000b00 (15)] - deleted block 73481095: The number of items (1) is incorrect, should be (0) - corrected block 73481095: The free space (4) is incorrect, should be (4072) - corrected block 73559406: The number of items (1) is incorrect, should be (0) - corrected block 73559406: The free space (65442) is incorrect, should be (4072) - corrected .block 76566909: The number of items (1280) is incorrect, should be (1) - corrected block 76566909: The free space (256) is incorrect, should be (3792) - corrected pass0: vpf-10110: block 76566909, item (0): Unknown item type found [50331905 16777472 0xde000e00 (15)] - deleted block 77580715: The number of items (1) is incorrect, should be (0) - corrected block 77580715: The free space (65480) is incorrect, should be (4072) - corrected block 79472804: The number of items (2304) is incorrect, should be (1) - corrected block 79472804: The free space (4352) is incorrect, should be (3536) - corrected pass0: vpf-10110: block 79472804, item (0): Unknown item type found [117441026 134222080 0x24001300 (15)] - deleted block 79475198: The number of items (768) is incorrect, should be (1) - corrected block 79475198: The free space (63744) is incorrect, should be (2768) - corrected pass0: vpf-10110: block 79475198, item (0): Unknown item type found [33554433 218103296 0x95000a00 (15)] - deleted block 80145536: The number of items (768) is incorrect, should be (1) - corrected block 80145536: The free space (256) is incorrect, should be (1488) - corrected pass0: vpf-10110: block 80145536, item (0): Unknown item type found [83886849 167772672 0x70000600 (15)] - deleted .20% left 292991467, 26021 /sec The problem has occurred looks like a hardware problem. If you have bad blocks, we advise you to get a new hard drive, because once you get one bad block that the disk drive internals cannot hide from your sight,the chances of getting more are generally said to become much higher (precise statistics are unknown to us), and this disk drive is probably not expensive enough for you to you to risk your time and data on it. If you don't want to follow that follow that advice then if you have just a few bad blocks, try writing to the bad blocks and see if the drive remaps the bad blocks (that means it takes a block it has in reserve and allocates it for use for of that block number). If it cannot remap the block, use badblock option (-B) with reiserfs utils to handle this block correctly. bread: Cannot read the block (122410614): (Input/output error). Aborted suggestions? Link to comment
itimpi Posted September 27, 2014 Share Posted September 27, 2014 That is the commonest response from reiserfsck if a drive come up a unformatted. You need to run as suggested with the --rebuild-tree to fix the issues. In most cases this clears the problem wit virtually no data loss. Note that at this point since the drive is red-balled you are writing to the emulated 'md9' device - not the [physical drive. You still have the physical drive (I would remove it from the server to play safe) to fall back on. Link to comment
JustinChase Posted September 27, 2014 Author Share Posted September 27, 2014 That was the result of running reiserfsck --rebuild-tree /dev/md9 anything else I can try? Link to comment
dgaschk Posted September 27, 2014 Share Posted September 27, 2014 Post a screen shot of unRAID Main and a SMART report for disk9. Link to comment
JustinChase Posted September 27, 2014 Author Share Posted September 27, 2014 screenshot attached. I ran smartctl -a -d ata /dev/sdg >/boot/smart.txt, but did not manually force a smart test recently. not sure if this report is valid, or useful in this case. smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.16.2-unRAID] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s) Device Model: WDC WD30EZRX-00D8PB0 Serial Number: WD-WCC4N1252623 LU WWN Device Id: 5 0014ee 25f94d9ce Firmware Version: 80.00A80 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Sat Sep 27 17:20:55 2014 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (40560) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 407) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x7035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 178 177 021 Pre-fail Always - 6058 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 170 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 697 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 12 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 5 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 3836 194 Temperature_Celsius 0x0022 122 113 000 Old_age Always - 28 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. Link to comment
JustinChase Posted September 27, 2014 Author Share Posted September 27, 2014 So, I assume the drive is screwed beyond remedy, or at least the parity representation of it; correct? if I just pull the actual drive, stick it in another machine, run a parity check on it, then stick it back into the server, i still couldn't rebuild the parity representation onto this drive correct? Is there anything I can do at this point to retrieve the data off of disk9, or do I just need to move on? I know I probably sound desperate, but I'm sick of not having my server, and I'm disappointed that Tom has not responded to my requests for help, so I really just want to put this behind me and get whatever remains of my data back into use, as some is better than nothing. Any ideas? Link to comment
jphipps Posted September 28, 2014 Share Posted September 28, 2014 I would stop the array completely, and mount the /dev/sdX1 partition and you should be able to see the data on the drive. If the data on the drive looks good and mounts fine. I think I would just start a new config and put all the drives back in as they were and atleast you should have all the original disk9 data in the array. Then you would just need to worry about the dups.. Link to comment
JustinChase Posted September 28, 2014 Author Share Posted September 28, 2014 how do I ...mount the /dev/sdX1 partition and you should be able to see the data on the drive. I tried... root@media:/mnt# mount /dev/sdg1 and got... mount: can't find /dev/sdg1 in /etc/fstab or /etc/mtab Link to comment
jphipps Posted September 28, 2014 Share Posted September 28, 2014 you have to give it a mount point and possible filesystem type: mkdir /mnt/somedisk mount /dev/sdg1 /mnt/somedis If it says you need a filesystem type, just add a -t {filesystem type} after mount. Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.